Effective foreign word extraction for Korean information retrieval期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Effective foreign word extraction for Korean information retrieval

Institution:	Division of Computer Science, Department of Electrical Engineering & Computer Science, Advanced Information Technology Research Center (AITrc), Korea Terminology Research Center for Language and Knowledge Engineering (KORTERM), Korea Advanced Institute of Science and Technology, 373-1 Kusong-dong, Yusong-gu, Taejon 305-701, South Korea

Abstract:	In Korean text, foreign words, which are mostly transliterations of English words, are frequently used. Foreign words are usually very important index terms in Korean information retrieval since most of them are technical terms or names. So accurate foreign word extraction is crucial for high performance of information retrieval. However, accurate foreign word extraction is not easy because it inevitably accompanies word segmentation and most of the foreign words are unknown. In this paper, we present an effective foreign word recognition and extraction method. In order to accurately extract foreign words, we developed an effective method of word segmentation that involves unknown foreign words. Our word segmentation method effectively utilizes both unknown word information acquired through the automatic dictionary compilation and foreign word recognition information. Our HMM-based foreign word recognition method does not require large labeled examples for the model training unlike the previously proposed method.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏