首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于半监督与词向量加权的文本分类研究
引用本文:贺可强.基于半监督与词向量加权的文本分类研究[J].教育技术导刊,2009,8(9):27-30.
作者姓名:贺可强
作者单位:山东科技大学 计算机科学与工程学院,山东 青岛 266590
摘    要:针对文本分类领域的有监督学习往往需要大量精准标注样本但大量人工标注困难的问题,提出一种新型的半监督学习方式,通过协同训练合理使用大量未标记训练样本,通过添加不同分类的文本特征噪声解决传统协同半监督学习方法中模型参数趋于统一的问题,同时提高分类模型的分类能力。针对传统深度学习方法中文本特征权重一致导致的分类特异性特征不突出问题,在训练模型中加入 self-attention 机制对文本句子特征权重进行提取,并通过句子加权方式突出特异性分类特征。实验结果表明,通过半监督学习方式同时使用少量已标注数据进行训练,模型能够达到 91.4%的准确率,召回率达到 84.3%,与有监督训练方式的分类准确能力相近,从而解决大量人工标注问题,具有一定的使用价值。

关 键 词:文本分类  有监督  人工标注  半监督  协同训练  self-attention  机制  
收稿时间:2019-12-10

Research on Text Classification Based on Semi-supervised and Word Vector Weighting
SONG Jian-guo.Research on Text Classification Based on Semi-supervised and Word Vector Weighting[J].Introduction of Educational Technology,2009,8(9):27-30.
Authors:SONG Jian-guo
Institution:School of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China
Abstract:For supervised learning in the field of text classification,it often requires a large number of accurately labeled samples,but a large number of manual labeling is difficult. In this paper,a new semi-supervised learning method is proposed. A large number of unlabeled training samples are reasonably used through collaborative training. Text feature noise of different classifications solves the problem that model parameters tend to be unified in traditional collaborative semi-supervised learning methods,and improves the classification ability of classification models. At the same time,the text feature weights in traditional deep learning methods are consistent,leading to specific classification. For the problem of not prominent sexual features,a self-attention mechanism is added to the training model to extract the feature weights of text sentences and highlight the specific classification features by sentence weighting. Experimental results show that the semi-supervised learning method simultaneously using a small amount of labeled data for training,the model in this article can achieve an accuracy rate of 91.4% and a recall rate of 84.3% ,which can achieve classification accuracy capabilities similar to supervised training methods,thereby solving the time-consuming problem of manual labeling,and the method has a certain application value.
Keywords:text classification  supervised  manual annotation  semi-supervised  collaborative training  self-attention mechanism  
本文献已被 维普 等数据库收录!
点击此处可从《教育技术导刊》浏览原始摘要信息
点击此处可从《教育技术导刊》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号