首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于粗糙集加权的文本分类方法研究
引用本文:胡清华,谢宗霞,于达仁.基于粗糙集加权的文本分类方法研究[J].情报学报,2005,24(1):59-63.
作者姓名:胡清华  谢宗霞  于达仁
作者单位:哈尔滨工业大学,哈尔滨,150001
摘    要:文本自动分类是当前智能信息处理中一类重要的研究课题。本文分析了基于统计理论的文本分类的基本特点,提出采用可变精度粗糙集模型中的分类质量构造新的特征词权重计算公式。这种新的加权方法,相对于广泛使用的逆文本频率加权方法,大大改进了文本样本在整个空间中的分布,使得类内距离减少,类间距离增大,在理论上将提高样本的可分性。最后利用支持向量机和K近邻两种分类器,验证了这种新的加权方法对分类效果确实有所提高。

关 键 词:文本分类  权重公式  类内散度  类间散度
修稿时间:2004年4月28日

Weighting Algorithm for Text Classification Based on Rough Set Approach
Hu Qinghua,Xie Zongxia and Yu Daren.Weighting Algorithm for Text Classification Based on Rough Set Approach[J].Journal of the China Society for Scientific andTechnical Information,2005,24(1):59-63.
Authors:Hu Qinghua  Xie Zongxia and Yu Daren
Abstract:Automatic text classification is one of the key subjects in intelligent information processing. In this paper we analyze the characteristics of text classification based on statistical theory, and introduce a novel weighting formula for feature words by applying approximation quality in variable rough set model, which improves the distribution of text samples compared with inverse frequency weighting formula. The samples in same classes are more compact and the samples between different classes are looser, which will contribute to automatic classification. Some experiments applying SVM and K nearest neighbor classifiers are performed based on different weighting algorithms. The results show that this novel weighting method improves the performance of classification.
Keywords:text classification    weighting formula    within  class scatter    between  class scatter  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号