Some Formal Analysis of Rocchio's Similarity-Based Relevance Feedback Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Some Formal Analysis of Rocchio's Similarity-Based Relevance Feedback Algorithm

Authors:	Zhixiang Chen Binhai Zhu

Institution:	(1) Department of Computer Science, University of Texas-Pan American, 1201 West University Drive, Edinburg, TX 78539, USA;(2) Department of Computer Science, Montana State University, Bozeman, MT 59717, USA

Abstract:	Rocchio's similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In spite of its popularity in various applications there is little rigorous analysis of its learning complexity in literature. In this paper we show that in the binary vector space model, if the initial query vector is 0, then for any of the four typical similarities (inner product, dice coefficient, cosine coefficient, and Jaccard coefficient), Rocchio's similarity-based relevance feedback algorithm makes at least n mistakes when used to search for a collection of documents represented by a monotone disjunction of at most k relevant features (or terms) over the n-dimensional binary vector space {0, 1}ⁿ. When an arbitrary initial query vector in {0, 1}ⁿ is used, it makes at least (n + k – 3)/2 mistakes to search for the same collection of documents. The linear lower bounds are independent of the choices of the threshold and coefficients that the algorithm may use in updating its query vector and making its classification.

Keywords:	relevance feedback vector space supervised learning similarity lower bound
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏