首页 | 本学科首页   官方微博 | 高级检索  
     检索      

多类多标签汉语文本自动分类的研究
引用本文:施彤年,卢忠良,荣融,王家云.多类多标签汉语文本自动分类的研究[J].情报学报,2003,22(3):306-309.
作者姓名:施彤年  卢忠良  荣融  王家云
作者单位:1. 上海交通大学计算机与工程系,上海,200032
2. 国防科技大学电子科学与工程学院,长沙,410073
3. 解放军61587部队,上海,200336
摘    要:本文提出了一种高效的汉语文本分类方法 ,并在实验中收到了良好的效果。由于汉语文本的特殊性 ,在训练前对训练文本进行自动分词和降维预处理。许多文本往往可能归到多个类 ,分类算法采用改进的Boosting算法。实验表明 ,在多类多标签的汉语文本特征提取和文档分类中 ,该算法收敛快、准确性高、综合效果较好

关 键 词:多类多标签  分词  降维  弱假设  弱学习
修稿时间:2002年4月21日

Research on the Chinese Text Categorization of Multi-Classification and Multi-Label
Shi Tongnian.Research on the Chinese Text Categorization of Multi-Classification and Multi-Label[J].Journal of the China Society for Scientific andTechnical Information,2003,22(3):306-309.
Authors:Shi Tongnian
Abstract:This paper has initiated a high efficiency method of the Chinese text categorization, which has led to good experimenting results. On account of the uniqueness of the Chinese texts, word segmenting and space reducing are done preliminarily to the training text. The given texts can always be classified into different classes. Therefore, the algorithm here adopted an improved Boosting Algorithm. The experiments proved that in abstracting the characteristics and classifying the documents of Chinese texts under the Multi Classification and Multi Label, this algorithm is of high accuracy and quick convergence, which improved the classifying efficiency.
Keywords:Multi  Classification and Multi  Label  word segmentation  space reduction  weak hypotheses  weak learner  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号