首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean
Authors:Jee-Hyub Kim  Byung-Kwan Kwak  Seungwoo Lee  Geunbae Lee  Jong-Hyeok Lee
Institution:(1) Biological Research Information Center (BRIC), Pohang, South Korea;(2) Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea
Abstract:In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.
Keywords:corpus-based learning  compound noun indexing  filtering  information retrieval  search performance evaluation
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号