An information retrieval model based on vector space method by supervised learning |
| |
Authors: | Xiaoying Tai Fuji Ren Kenji Kita |
| |
Institution: | 1. Tencent AI Lab, Shenzhen 518060, China;2. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China;1. Postgraduate Program in Pharmaceutical Sciences, Federal University of Paraná, Curitiba, Paraná, Brazil;2. Pediatric Endocrinology Unit, Department of Pediatrics, Federal University of Paraná, Curitiba, Paraná, Brazil;1. Fujian Provincial Key Laboratory of Environment factors and Cancer, School of Public Health, Fujian Medical University, Fujian, China;2. Department of Epidemiology and Health Statistic, School of Public Health, Fujian Medical University, Fujian, China;3. Department of internal medicine, Guangdong Women and Children''s hospital, Guangzhou Medical University, 521 Xingnan Road, Guangzhou, China |
| |
Abstract: | This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later. |
| |
Keywords: | Information retrieval Supervised learning Vector space model Relevance feedback Singular value decomposition Linear transformation |
本文献已被 ScienceDirect 等数据库收录! |
|