首页 | 本学科首页   官方微博 | 高级检索  
     检索      


An information retrieval model based on vector space method by supervised learning
Authors:Xiaoying Tai  Fuji Ren  Kenji Kita
Institution:1. Tencent AI Lab, Shenzhen 518060, China;2. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China;1. Postgraduate Program in Pharmaceutical Sciences, Federal University of Paraná, Curitiba, Paraná, Brazil;2. Pediatric Endocrinology Unit, Department of Pediatrics, Federal University of Paraná, Curitiba, Paraná, Brazil;1. Fujian Provincial Key Laboratory of Environment factors and Cancer, School of Public Health, Fujian Medical University, Fujian, China;2. Department of Epidemiology and Health Statistic, School of Public Health, Fujian Medical University, Fujian, China;3. Department of internal medicine, Guangdong Women and Children''s hospital, Guangzhou Medical University, 521 Xingnan Road, Guangzhou, China
Abstract:This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.
Keywords:Information retrieval  Supervised learning  Vector space model  Relevance feedback  Singular value decomposition  Linear transformation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号