The Dilution/Concentration conditions for cross-language information retrieval models |
| |
Authors: | Bo Li Eric Gaussier Dan Yang |
| |
Institution: | 1. School of computer science, Central China Normal University, Wuhan, China;2. CNRS-LIG/AMA, Université Grenoble Alpes, Grenoble, France;3. China Electric Power Research Institute, Wuhan, China |
| |
Abstract: | Experimental results of cross-language information retrieval (CLIR) do not indicate why a model fails or how a model could be improved. One basic research question is thus whether it is possible to provide conditions by which one can evaluate any existing or new CLIR strategy analytically and one can improve the design of CLIR models. Inspired by the heuristics in monolingual IR, we introduce in this paper Dilution/Concentration (D/C) conditions to characterize good CLIR models based on direct intuitions under artificial settings. The conditions, derived from first principles in CLIR, generalize the idea of query structuring approach. Empirical results with state-of-the-art CLIR models show that when a condition is not satisfied, it often indicates non-optimality of the method. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies the conditions. Lastly, we propose, by following the D/C conditions, several novel CLIR models based on the information-based models, which again shows that the D/C conditions are efficient to feature good CLIR models. |
| |
Keywords: | Cross-language information retrieval D/C condition Information retrieval heuristic |
本文献已被 ScienceDirect 等数据库收录! |
|