首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval
Institution:1. Grupo LYS, Departamento de Computación, Facultade de Informática, Universidade da Coruña, Campus de Elviña, 15071 – A Coruña, Spain;2. Grupo COLE, Departamento de Informática, E.S. de Enxeñaría Informática, Universidade de Vigo, Campus As Lagoas, 32004 – Ourense, Spain;1. Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Hunan, China;2. Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands;1. Université de Toulouse, Laboratoire de Génie de Production (LGP), EA 1905, ENIT-INPT, 47 Avenue d’Azereix, BP 1629, Tarbes Cedex 65016, France;2. Université de Toulouse, Faculté de droit, 2 rue du Doyen Gabriel Marty, Toulouse cedex 9 31042, France;1. College of Education Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, China;2. College of Business and Administration, Zhejiang University of Technology, Hangzhou, 310023, China;3. College of Electrical and Information Engineering, Hunan University, Changsha, Hunan, 410082, China
Abstract:In contrast with their monolingual counterparts, little attention has been paid to the effects that misspelled queries have on the performance of Cross-Language Information Retrieval (CLIR) systems. The present work makes a first attempt to fill this gap by extending our previous work on monolingual retrieval in order to study the impact that the progressive addition of misspellings to input queries has, this time, on the output of CLIR systems. Two approaches for dealing with this problem are analyzed in this paper. Firstly, the use of automatic spelling correction techniques for which, in turn, we consider two algorithms: the first one for the correction of isolated words and the second one for a correction based on the linguistic context of the misspelled word. The second approach to be studied is the use of character n-grams both as index terms and translation units, seeking to take advantage of their inherent robustness and language-independence. All these approaches have been tested on a from-Spanish-to-English CLIR system, that is, Spanish queries on English documents. Real, user-generated spelling errors have been used under a methodology that allows us to study the effectiveness of the different approaches to be tested and their behavior when confronted with different error rates. The results obtained show the great sensitiveness of classic word-based approaches to misspelled queries, although spelling correction techniques can mitigate such negative effects. On the other hand, the use of character n-grams provides great robustness against misspellings.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号