共查询到17条相似文献,搜索用时 953 毫秒
1.
2.
研究了中文词自动分类问题。针对传统的蚁群算法中文词语分类精确度低等问题,提出了一种将蚁群算法应用到了中文词语自动分类中。方法建立在首先对大规模语料文本进行统计和计算的基础上,得到词的一元和二元信息,然后采用了蚁群算法对该信息进行词的分类。实验结果表明,提出的算法有效提高了词语分类的精确度。 相似文献
3.
4.
针对目前基于主题图的中文自动分类的空缺,文章在总结Ontopia对英文和挪威文自动分类的技术基础上,结合中文特殊性,构建了一个基于主题图的中文分类原型系统。该系统通过借助POI、PDF、SAX作为文档文本解析器提取文本,采用盘古分词对文本进行分析,以Java为系统实现主要语言,达到了基于主题图的中文自动分类的目的。 相似文献
5.
6.
在文本自动分类中,目前有词频和文档频率统计这两种概率估算方法,采用的估算方法恰当与否会直接影响特征抽取的质量与分类的准确度。本文采用K最近邻算法实现中文文本分类器,在中文平衡与非平衡两种训练语料下进行了训练与分类实验,实验数据表明使用非平衡语料语料时,可以采用基于词频的概率估算方法,使用平衡语料语料时,采用基于文档频率的概率估算方法,能够有效地提取高质量的文本特征,从而提高分类的准确度。 相似文献
7.
8.
9.
10.
基于贝叶斯方法在垃圾邮件处理上具有速度快、准确率高的优点,基于贝叶斯分类的垃圾邮件分类方法受到广泛的关注.我们主要研究制约中文邮件过滤效果的中文分词方法,比较基于统计的多种方法,并根据需要对其中几种算法进行改进. 相似文献
11.
垃圾邮件的泛滥提出了极为迫切的技术诉求,文章介绍了基于文本分类技术的垃圾邮件过滤系统模型,首先介绍了整个系统工作流程,然后阐述了系统中文本分词,文本特征提取,Winnow线性分类器等关键环节。 相似文献
12.
13.
简要介绍语义模板的概念,提出基于语义模板向量空间的文档自动分类模型。利用支持向量机(SVM,Support Vector Machine)分类算法对文档测试集进行基于语义模板空间、词向量空间的分类实验,实验结果表明,基于语义模板空间的文本分类性能比基于词向量空间的分类性能要高。 相似文献
14.
《Information processing & management》2022,59(2):102798
Automated legal text classification is a prominent research topic in the legal field. It lays the foundation for building an intelligent legal system. Current literature focuses on international legal texts, such as Chinese cases, European cases, and Australian cases. Little attention is paid to text classification for U.S. legal texts. Deep learning has been applied to improving text classification performance. Its effectiveness needs further exploration in domains such as the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. We propose a machine learning algorithm using domain concepts as features and random forests as the classifier. Our experiment results on 30,000 full U.S. case documents in 50 categories demonstrated that our approach significantly outperforms a deep learning system built on multiple pre-trained word embeddings and deep neural networks. In addition, applying only the top 400 domain concepts as features for building the random forests could achieve the best performance. This study provides a reference to select machine learning techniques for building high-performance text classification systems in the legal domain or other fields. 相似文献
15.
《Information processing & management》2023,60(2):103192
The replies of people seeking support in online mental health communities can be analyzed to discover if they feel better after receiving support; feeling better indicates a cognitive change. Most research uses key phrase matching and word frequency statistics to identify psychological cognitive change, methods that result in omissions and inaccuracy. This study constructs an intelligent method for identifying psychological cognitive change based on natural language processing technology. It incorporates information related to emotions that appears in reply text to help identify whether psychological cognitive change has occurred. The model first encodes the emotion information based on rule matching and manual annotation, then adds the encoded emotion lexicon and a cognitive change lexicon to a word2vec high-dimensional semantic word vector training, converts the annotated cognitive change recognition text into a vector matrix using the trained model, and train in the annotated text using TextCNN. To compare the results with those of the traditional methods (key phrase matching and sentiment word frequency statistics), this study uses a semi-automated approach to construct a lexicon of psychological cognitive change, as well as a keyword lexicon without cognitive change, based on word vectors and similarity. We compare the performance of the classifier before and after the fusion of the graphical emotion information, compare the LSTM and Transformer as baselines, and compare traditional word frequency statistics methods. The experimental results show that our proposed classification model performs better than the others; it achieves 84.38% precision, an 84.09% recall rate, and an 84.17% F1 value. Our work bears methodological implications for online mental health platforms. 相似文献
16.
为解决社区问答系统中的问题短文本特征词少、描述信息弱的问题,本文利用维基百科进行特征扩展以辅助中文问题短文本分类。首先通过维基百科概念及链接等信息进行词语相关概念集合抽取,并综合利用链接结构和类别体系信息进行概念间相关度计算。然后以相关概念集合为基础进行特征扩展以补充文本特征语义信息。实验结果表明,本文提出的基于特征扩展的短文本分类算法能有效提高问题短文本分类效果。 相似文献
17.
借助文本分类系统软件,采用来自10个大类的中文文本数据,按照训练集与测试集2:1的比例,使用KNN和SVM分类算法,对数据集进行自动分类的实验。旨在通过具体的语料库实验,探讨文本自动分类的关键技术,分析、比较与评价实验结果,探讨文本分类中具体参数的设置和不同分类算法之优劣。 相似文献