基于统计的常用词搭配(Collocation)的发现方法 Methods of Finding the Collocation Based on Statistics期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于统计的常用词搭配(Collocation)的发现方法

引用本文：	孙健,王伟,钟义信.基于统计的常用词搭配(Collocation)的发现方法[J].情报学报,2002,21(1):12-16.

作者姓名：	孙健王伟钟义信

作者单位：	北京邮电大学智能研究中心,北京,100876

基金项目：	国家自然科学基金资助资助项目为:面向智能的信息理论及应用项目编号 :6 9982 0 0 1

摘要：	常用词搭配 (collocation)是指一些常用的用来表达某些事情或事物的短语 ,一般是二元组或三元组等。常用词搭配的自动发现在自然语言处理中起着很重要的作用 ,它能够丰富词典的容量 ,提高系统的性能。本文提出 4种判断一个二元组是否是常用词搭配的方法 ,并对各种方法的结果进行了比较。然后在已知二元组的基础上 ,提出了统计和发现三元或多元组的方法。这种利用二元组来构造三元组的方法 ,比统计所有三元组的计算量大大减少。实验证明这种方法的结果较好
关键词：	常用词搭配二元组自然语言处理
修稿时间：	2000年10月18
Methods of Finding the Collocation Based on Statistics

Sun Jian,Wang Wei and Zhong Yixin.Methods of Finding the Collocation Based on Statistics[J].Journal of the China Society for Scientific andTechnical Information,2002,21(1):12-16.

Authors:	Sun Jian Wang Wei and Zhong Yixin

Abstract:	A collocation is defined as a sequence of two or more consecutive words,that has characteristics of a syntactic and semantic unit,and whose exact and unambiguous meaning or connotation cannot be derived directly from the meaning or connotation of its components.It is an important task to automatically find the collocation.The paper puts forward some methods that decide whether an expression is an collocation.The Methods include mutual information,t test,Pearson's chi square test and Likelihood Ratio.The results show that the methods are feasible.

Keywords:	collocation n gram natural language processing
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏