首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于粗集和决策树的Web文本分类规则抽取
引用本文:王煜,王正欧,王明春.基于粗集和决策树的Web文本分类规则抽取[J].情报学报,2005,24(6):674-678.
作者姓名:王煜  王正欧  王明春
作者单位:1. 天津大学系统工程研究所,天津,300072;河北大学数学与计算机学院,保定,071002
2. 天津大学系统工程研究所,天津,300072
基金项目:国家自然科学基金资助项目(60275020)
摘    要:本文根据CHI值原理、粗集理论和决策树原理,提出了一种抽取Web文本分类规则的新方法。决策树分类方法具有出色的数据分析效率和容易抽取、易于理解的分类规则等优势,但对于维数达成千上万维的分类问题很难应用。因此本文先根据CHI值选择每个文本类中对分类贡献大的若干词条,然后采用粗集理论方法对选择的特征进行进一步提取,这样得到维数较小的文本特征向量空间,最后再使用决策树进行分类,从而既保证了分类精度又可很容易地抽取出利于理解的文本分类规则。

关 键 词:特征提取  CHI值  粗集理论  决策树
修稿时间:2004年12月24

WEB Text Categorization Rule Extraction Based on Rough Set and Decision Tree
Wang Yu,Wang Zheng'ou,Wang Mingchun.WEB Text Categorization Rule Extraction Based on Rough Set and Decision Tree[J].Journal of the China Society for Scientific andTechnical Information,2005,24(6):674-678.
Authors:Wang Yu  Wang Zheng'ou  Wang Mingchun
Abstract:This paper presents a new method of WEB text categorization rule extraction based on the CHI value theory,rough set theory and decision tree.The decision tree is applied to text categorization,which has the advantages of high efficiency of data analysis and easily abstracting the understandable categorization rules.However,decision tree has a defect that is hardly applied for thousands of dimensions of features.Therefore,CHI value is proposed to process the feature selections that have more contributes to text categorization.And rough set is used to further reduce the attributes.This method can largely reduce the dimension of the vector space.Finally,the decision tree is applied to text categorization.Thus both understandable categorization rule can be extracted easily and better accuracy of categorization can be acquired.
Keywords:feature selection  CHI value  rough set theory  decision tree  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号