首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Engineering a multi-purpose test collection for Web retrieval experiments
Authors:Peter Bailey  Nick Craswell  David Hawking  
Institution:a Department of Computer Science, The Australian National University, Canberra, ACT 0200, Australia;b CSIRO Mathematics and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Abstract:Past research into text retrieval methods for the Web has been restricted by the lack of a test collection capable of supporting experiments which are both realistic and reproducible. The 1.69 million document WT10g collection is proposed as a multi-purpose testbed for experiments with these attributes, in distributed IR, hyperlink algorithms and conventional ad hoc retrieval.WT10g was constructed by selecting from a superset of documents in such a way that desirable corpus properties were preserved or optimised. These properties include: a high degree of inter-server connectivity, integrity of server holdings, inclusion of documents related to a very wide spread of likely queries, and a realistic distribution of server holding sizes. We confirm that WT10g contains exploitable link information using a site (homepage) finding experiment. Our results show that, on this task, Okapi BM25 works better on propagated link anchor text than on full text.WT10g was used in TREC-9 and TREC-2000 and both topic relevance and homepage finding queries and judgments are available.
Keywords:Web retrieval  Link-based ranking  Distributed information retrieval  Test collections
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号