首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于模糊关系的文本分类特征选择方法
引用本文:甄志龙,韩立新,陆佃龙.基于模糊关系的文本分类特征选择方法[J].情报学报,2008,27(6).
作者姓名:甄志龙  韩立新  陆佃龙
作者单位:1. 河海大学计算机及信息工程学院,南京,210098;通化师范学院计算机科学系,通化,134002
2. 河海大学计算机及信息工程学院,南京,210098
摘    要:若要有效地实现文本分类,关键是对高维特征空间进行降维,降维方法分为特征选择和特征提取.本文对已有特征选择方法分析后发现,这些方法仅利用文档数来选择特征,没有考虑特征项的权重.为了找出本质特征,我们提出了一种基于特征项与类之间模糊关系的特征选择方法,引入特征项权重来确定其隶属度.采用KNN分类器,在Reuters-21578标准文本数据集上进行了训练和测试.实验表明,宏平均和微平均都达到了最高,分别为81.82%和94.88%,宏平均比IG,CHI提高了4.73%和1.12%,微平均比IG,CHI提高了1.56%和0.21%.

关 键 词:文本分类  特征项权重  模糊关系  特征选择

Feature Selection Based on Fuzzy Relation for Text Categorization
Zhen Zhilong,Han Lixin,Lu Dianlong.Feature Selection Based on Fuzzy Relation for Text Categorization[J].Journal of the China Society for Scientific andTechnical Information,2008,27(6).
Authors:Zhen Zhilong  Han Lixin  Lu Dianlong
Institution:Zhen Zhilong~(1,2) Han Lixin~1 Lu Dianlong~1 1.College of Computer & Information Engineering,Hohai University,Nanjing 210098,2.Department of Computer Science,Tonghua Teachers College,Tonghua 134002
Abstract:For the effective implementation of text categorization,the key step is dimensionality reduction for high- dimensional feature space,including feature selection and feature extraction.In the paper,after the previous methods of feature selection analyzed,they used only a few document numbers to choose features,while not using term weights.To discover the essential features through full advantage of term weights,training samples and classes,a method of feature selection based on fuzzy relation between terms a...
Keywords:text categorization  term weights  fuzzy relation  feature selection  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号