基于模式聚合和决策树的文本分类规则抽取 Text Categorization Rule Extraction Based on Pattern Aggregation and Decision Tree期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于模式聚合和决策树的文本分类规则抽取

引用本文：	王煜,王正欧.基于模式聚合和决策树的文本分类规则抽取[J].情报科学,2006,24(1):96-99,123.

作者姓名：	王煜王正欧

作者单位：	1. 河北大学,数学与计算机学院,河北,保定,071002 2. 天津大学,系统工程研究所,天津,300072

摘要：	本文首先提出一种改进的X^2统计量，以此衡量词条对文本分类的贡献。然后根据模式聚合理论，将对各文本类分类贡献比例相近似的词条聚合为一个特征，建立出文本集的特征向量空间模型。此方法有效地降低了文本特征向量空间的维数。最后使用决策树进行分类，从而既保证了分类精度又获得了决策树易于抽取可理解的分类规则的优势。
关键词：	规则抽取模式聚合 χ2统计量决策树
文章编号：	1007-7634（2006）01-0096-04
收稿时间：	2005-05-10
修稿时间：	2005-05-10
Text Categorization Rule Extraction Based on Pattern Aggregation and Decision Tree

WANG Yu,WANG Zheng-ou.Text Categorization Rule Extraction Based on Pattern Aggregation and Decision Tree[J].Information Science,2006,24(1):96-99,123.

Authors:	WANG Yu WANG Zheng-ou

Institution:	1. School of Maths and Computer, Hebei University, Baoding 071002, China; 2. Institute of Systems Engineering, Tianfin University,Tianjin 300072, China

Abstract:	In this paper, an improved X^2 statistic is given, which is used to measure contribution for categorization. The new method establishes the text vector space model in terms of the improved X^2 statistic and the theory of pattern aggregation, which merges some words as a new feature that has the approximate proportion of contribution for categorization, and so largely reduces the dimension of the vector space. And then, the decision tree is applied to text categorization. Both the understandable categorization rules and better accuracy of categorization can be acquired.

Keywords:	rule extraction pattern aggregation X^2 statistic decision tree
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏