首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Term norm distribution and its effects on Latent Semantic Indexing
Institution:1. Center for Studies of Information Resources, Wuhan University, Bayi Rd. 299, Wuhan, Hubei 430072, China;2. School of Information Management, Wuhan University, Bayi Rd 299, Wuhan 430072, China
Abstract:Latent Semantic Indexing (LSI) uses the singular value decomposition to reduce noisy dimensions and improve the performance of text retrieval systems. Preliminary results have shown modest improvements in retrieval accuracy and recall, but these have mainly explored small collections. In this paper we investigate text retrieval on a larger document collection (TREC) and focus on distribution of word norm (magnitude). Our results indicate the inadequacy of word representations in LSI space on large collections. We emphasize the query expansion interpretation of LSI and propose an LSI term normalization that achieves better performance on larger collections.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号