首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Parsimonious translation models for information retrieval
Authors:Seung-Hoon Na  In-Su KangJong-Hyeok Lee
Institution:Division of Electrical and Computer Engineering, Pohang University of Science and Technology, POSTECH, AITrc, San 31, Hyojadong, Namgu, Pohang, Kyeongbook 790784, Republic of Korea
Abstract:In the KL divergence framework, the extended language modeling approach has a critical problem of estimating a query model, which is the probabilistic model that encodes the user’s information need. For query expansion in initial retrieval, the translation model had been proposed to involve term co-occurrence statistics. However, the translation model was difficult to apply, because the term co-occurrence statistics must be constructed in the offline time. Especially in a large collection, constructing such a large matrix of term co-occurrences statistics prohibitively increases time and space complexity. In addition, reliable retrieval performance cannot be guaranteed because the translation model may comprise noisy non-topical terms in documents. To resolve these problems, this paper investigates an effective method to construct co-occurrence statistics and eliminate noisy terms by employing a parsimonious translation model. The parsimonious translation model is a compact version of a translation model that can reduce the number of terms containing non-zero probabilities by eliminating non-topical terms in documents. Through experimentation on seven different test collections, we show that the query model estimated from the parsimonious translation model significantly outperforms not only the baseline language modeling, but also the non-parsimonious models.
Keywords:Information retrieval  Language model  Parsimonious translation model  Query expansion
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号