首页 | 本学科首页   官方微博 | 高级检索  
     检索      

词语相似度算法研究综述
引用本文:李慧.词语相似度算法研究综述[J].现代情报,2015,35(4):172-177.
作者姓名:李慧
作者单位:南京邮电大学图书馆, 江苏 南京210046
摘    要:词语相似度计算方法在信息检索、词义消歧、机器翻译等自然语言处理领域有着广泛的应用。现有的词语相似度算法主要分为基于统计和基于语义资源两类方法,前者是从大规模的语料中统计与词语共现的上下文信息以计算其相似度,而后者利用人工构建的语义词典或语义网络计算相似度。本文比较分析了两类词语相似度算法,重点介绍了基于Web语料库和基于维基百科的算法,并总结了各自的特点和不足之处。最后提出,在信息技术的影响下,基于维基百科和基于混合技术的词语相似度算法以及关联数据驱动的相似性计算具有潜在的发展趋势。

关 键 词:词语相似度  语义资源  语料库  维基百科  WordNet

A Review on the Research of Word Similarity Algorithms
Authors:Li Hui
Institution:Library, Nanjing University of Posts and Telecommunications, Nanjing 210046, China
Abstract:The word similarity algorithm is widely used in the field of natural language processing, such as information retrieval, word sense disambiguation and machine translation based on examples. The exiting semantic similarity algorithms are mainly divided into two types: semantic resource and statistic, the former algorithm calculates the similarity based on a manual semantic dictionary, and the latter finds out the word occurrence information in the context from a large corpus. This paper studied and compared two kinds of word similarity algorithm, and focused on the introduction of algorithms based on web corpus or Wikipedia, and then summarized the characteristics and deficiencies respectively. Finally, it put forward that under the influence of information technologies, the word similarity algorithms based on Wikipedia and based on hybrid technology and Linked Data driven Similarity computing had potential developing space.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《现代情报》浏览原始摘要信息
点击此处可从《现代情报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号