首页 | 本学科首页   官方微博 | 高级检索  
     检索      


The optimum clustering framework: implementing the cluster hypothesis
Authors:Email author" target="_blank">Norbert?FuhrEmail author  Marc?Lechtenfeld  Benno?Stein  Tim?Gollub
Institution:1.University of Duisburg-Essen,Duisburg,Germany;2.Bauhaus-Universit?t Weimar,Weimar,Germany
Abstract:Document clustering offers the potential of supporting users in interactive retrieval, especially when users have problems in specifying their information need precisely. In this paper, we present a theoretic foundation for optimum document clustering. Key idea is to base cluster analysis and evalutation on a set of queries, by defining documents as being similar if they are relevant to the same queries. Three components are essential within our optimum clustering framework, OCF: (1) a set of queries, (2) a probabilistic retrieval method, and (3) a document similarity metric. After introducing an appropriate validity measure, we define optimum clustering with respect to the estimates of the relevance probability for the query-document pairs under consideration. Moreover, we show that well-known clustering methods are implicitly based on the three components, but that they use heuristic design decisions for some of them. We argue that with our framework more targeted research for developing better document clustering methods becomes possible. Experimental results demonstrate the potential of our considerations.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号