基于粗糙集加权的文本分类方法研究 Weighting Algorithm for Text Classification Based on Rough Set Approach期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于粗糙集加权的文本分类方法研究

引用本文：	胡清华,谢宗霞,于达仁.基于粗糙集加权的文本分类方法研究[J].情报学报,2005,24(1):59-63.

作者姓名：	胡清华谢宗霞于达仁

作者单位：	哈尔滨工业大学,哈尔滨,150001

摘要：	文本自动分类是当前智能信息处理中一类重要的研究课题。本文分析了基于统计理论的文本分类的基本特点,提出采用可变精度粗糙集模型中的分类质量构造新的特征词权重计算公式。这种新的加权方法,相对于广泛使用的逆文本频率加权方法,大大改进了文本样本在整个空间中的分布,使得类内距离减少,类间距离增大,在理论上将提高样本的可分性。最后利用支持向量机和K近邻两种分类器,验证了这种新的加权方法对分类效果确实有所提高。
关键词：	文本分类权重公式类内散度类间散度
修稿时间：	2004年4月28日
Weighting Algorithm for Text Classification Based on Rough Set Approach

Hu Qinghua,Xie Zongxia and Yu Daren.Weighting Algorithm for Text Classification Based on Rough Set Approach[J].Journal of the China Society for Scientific andTechnical Information,2005,24(1):59-63.

Authors:	Hu Qinghua Xie Zongxia and Yu Daren

Abstract:	Automatic text classification is one of the key subjects in intelligent information processing. In this paper we analyze the characteristics of text classification based on statistical theory, and introduce a novel weighting formula for feature words by applying approximation quality in variable rough set model, which improves the distribution of text samples compared with inverse frequency weighting formula. The samples in same classes are more compact and the samples between different classes are looser, which will contribute to automatic classification. Some experiments applying SVM and K nearest neighbor classifiers are performed based on different weighting algorithms. The results show that this novel weighting method improves the performance of classification.

Keywords:	text classification weighting formula within class scatter between class scatter
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏