A reliable FAQ retrieval system using a query log classification technique based on latent semantic analysis |
| |
Authors: | Harksoo Kim Hyunjung Lee Jungyun Seo |
| |
Institution: | 1. Program of Computer and Communications Engineering, College of Information Technology, Kangwon National University, 192-1 Hyoja 2(i)-dong, Chuncheon-si, Gangwon-do 200-701, Republic of Korea;2. Natural Language Processing Laboratory, Department of Computer Science, Sogang University, Sinsu-dong 1, Seoul 121-742, Republic of Korea;3. Department of Computer Science and Interdisciplinary Program of Integrated Biotechnology, Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul 121-742, Republic of Korea |
| |
Abstract: | To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users’ query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users’ query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources. |
| |
Keywords: | FAQ retrieval Lexical disagreement problem Query log clusters Latent semantic analysis |
本文献已被 ScienceDirect 等数据库收录! |
|