一种基于维基百科的中文短文本分类算法 Classification Algorithm of Chinese Short Texts Based on Wikipedia期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种基于维基百科的中文短文本分类算法

引用本文：	赵辉,刘怀亮.一种基于维基百科的中文短文本分类算法[J].图书情报工作,2013,57(11):120-124.

作者姓名：	赵辉刘怀亮

作者单位：	西安电子科技大学经济与管理学院

摘要：	为解决短文本特征词少、概念信号弱的问题,结合维基百科进行特征扩展以辅助中文短文本分类。通过维基百科概念及链接等信息进行词语相关概念集合抽取、概念间相关度计算,利用消歧页结合短文本上下文信息解决一词多义问题,进而以词语间语义相关关系为基础进行特征扩展,以补充文本特征语义信息。最后,给出基于维基百科的中文短文本分类算法,并对其进行实验验证。结果表明,该算法能有效提高中文短文本分类效果。
关键词：	短文本分类维基百科词义消歧特征扩展
收稿时间：	2013-04-03
修稿时间：	2013-05-19
Classification Algorithm of Chinese Short Texts Based on Wikipedia

Zhao Hui,Liu Huailiang.Classification Algorithm of Chinese Short Texts Based on Wikipedia[J].Library and Information Service,2013,57(11):120-124.

Authors:	Zhao Hui Liu Huailiang

Institution:	Department of Economic Management, Xidian University, Xi’an 710071

Abstract:	In order to resolve the problems of the lack key words of short texts and weak signal concepts, this paper proposes a method of feature extension based on Wikipedia to classify Chinese short texts. It extracts the set of related concepts and computes the concept relevancy with Wikipedia concepts and interlinkages, and avoids the polysemy problem by combining ambiguous page with the context extracted from short texts. Then it makes the feature extension based on the theory of semantic relevance relation between words, to supply semantic features information of texts. Finally, this paper put forwards Wikipedia-based classification algorithm of Chinese short texts and verifies it. The results show that the algorithm can get better classified effect of Chinese short texts.

Keywords:	short text classification Wikipedia word sense disambiguation feature extension
本文献已被万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏