首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Using Web structure and summarisation techniques for Web content mining
Institution:1. Department of Library and Information Science, Yonsei University, Republic of Korea;2. Pancreatobiliary Cancer Clinic, Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Republic of Korea;1. Department of Marketing and Electronic Business, School of Business (Management), Nanjing University, China;2. School of Information Technologies, The University of Sydney, Australia;3. Department of Information Systems, National University of Singapore, Singapore;1. Department of Economics and IRES, National University of Singapore, Singapore;2. Department of Real Estate and IRES, National University of Singapore, Singapore;1. Department of Information Management, National Sun Yat-Sen University, No. 70, Lienhai Road, Kaohsiung 80424, Taiwan, ROC;2. Department of Information Management, Chia-Nan University of Pharmacy and Science, No. 60, Erh-Jen Road, Sec. 1, Jen-Te, Tainan, Taiwan, ROC;3. Department of Information Science and Management System, National Taitung University, No. 684, Sec. 1, Chunghua Road, Taitung, Taiwan, ROC;1. University of North Dakota, Grand Forks, ND, USA;2. Metropolitan State University, Saint Paul, MN, USA
Abstract:The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a “Tree-like” Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号