Bias and the limits of pooling for large collections |
| |
Authors: | Chris Buckley Darrin Dimmick Ian Soboroff Ellen Voorhees |
| |
Institution: | (1) Sabir Research, Inc., Gaithersburg, MD 20878, USA;(2) Information Technology Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899-8940, USA |
| |
Abstract: | Modern retrieval test collections are built through a process called pooling in which only a sample of the entire document
set is judged for each topic. The idea behind pooling is to find enough relevant documents such that when unjudged documents
are assumed to be nonrelevant the resulting judgment set is sufficiently complete and unbiased. Yet a constant-size pool represents
an increasingly small percentage of the document set as document sets grow larger, and at some point the assumption of approximately
complete judgments must become invalid. This paper shows that the judgment sets produced by traditional pooling when the pools
are too small relative to the total document set size can be biased in that they favor relevant documents that contain topic
title words. This phenomenon is wholly dependent on the collection size and does not depend on the number of relevant documents
for a given topic. We show that the AQUAINT test collection constructed in the recent TREC 2005 workshop exhibits this biased
relevance set; it is likely that the test collections based on the much larger GOV2 document set also exhibit the bias. The
paper concludes with suggested modifications to traditional pooling and evaluation methodology that may allow very large reusable
test collections to be built.
|
| |
Keywords: | Test collections Pooling Sampling bias |
本文献已被 SpringerLink 等数据库收录! |
|