首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义网络社团划分的中文文本分类研究
引用本文:尹丽英,赵捧未.基于语义网络社团划分的中文文本分类研究[J].图书情报工作,2014,58(19):124-128.
作者姓名:尹丽英  赵捧未
作者单位:1. 西安电子科技大学经济与管理学院; 2. 西安邮电大学经济与管理学院
基金项目:本文系国家自然科学基金项目“基于知识地图的对等网语义社区及其知识共享研究”(项目编号:71103138)和中央高校基础科研业务费资助项目“大数据背景下基于用户生成内容的商务智能模型研究”(项目编号:BDY231414)研究成果之一。
摘    要:为减少一词多义现象及训练样本的类偏斜问题对分类性能的影响,提出一种基于语义网络社团划分的中文文本分类算法。通过维基百科知识库对文本特征词进行消歧,构建出训练语义复杂网络以表示文本间的语义关系,再次结合节点特性采用K-means算法对训练集进行社团划分以改善类偏斜问题,进而查找待分类文本的最相近社团并以此为基础进行文本分类。实验结果表明,本文所提出的中文文本分类算法是可行的,且具有较好的分类效果。

关 键 词:语义网络  词义消歧  社团结构  文本分类  
收稿时间:2014-06-09

A Chinese Text Classification Algorithm Based on Partitioning Community in Semantic Network
Yin Liying,Zhao Pengwei.A Chinese Text Classification Algorithm Based on Partitioning Community in Semantic Network[J].Library and Information Service,2014,58(19):124-128.
Authors:Yin Liying  Zhao Pengwei
Institution:1. School of Economics & Management, Xidian University, Xi'an 710071; 2. College of Economics and Management, Xi'an University of Post & Telecommunications, Xi'an 710121
Abstract:In order to reduce the polysemy phenomenon and the influence of the category deflect problem of training samples, a Chinese text categorization method was proposed on community division of semantic network. Firstly, disambigurtion was in progress through Wikipedia knowledge base, the complex network of text is built in order to represent the semantic relations between training texts. Then, in order to improve the problem of category deflect, the training samples is partitioned by the method of K-means which combined with the synthetic characteristics of network nodes. Finally, the text classification based on the nearest community of testing text is found out according to the nearest community. Results of experiments show that the algorithm proposed by this paper is feasible and can improve the effect of its classification.
Keywords:semantic network  word sense disambiguation  community structure  text classification  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号