首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于文本潜在特性分类方法研究与仿真
引用本文:巫桂梅.基于文本潜在特性分类方法研究与仿真[J].科技通报,2012,28(7):148-151.
作者姓名:巫桂梅
作者单位:广东工业大学网络与教育技术研究所,广东,510006
摘    要:研究文本快速准确分类的问题。同一词语在不同的语言环境下或者由不同的人使用可能代表不同的含义,这些词语在文本分类中的描述特征却极为相似。传统的文本分类方法是将文本表示成向量空间模型,向量空间模型只是从词语的出现频率角度构造,当文中出现一些多义词和同义词时就会出现分类延时明显准确性不高等特点。为此提出一种基于语义索引的文本主题匹配方法。将文本进行关键词的抽取后构造文档-词语矩阵,SVD分解后通过优化平衡的方法进行矩阵降维与相似度的计算,克服传统方法的弊端。实践证明,这种方法能大幅度降低同义词与多义词对文本分类时的影响,使文本按主题匹配分类时准确高效,实验效果明显提高。

关 键 词:文本主题匹配  平衡优化  潜在语义索引

Text Classification Method Research and Potential Characteristic Simulation
WU Guimei.Text Classification Method Research and Potential Characteristic Simulation[J].Bulletin of Science and Technology,2012,28(7):148-151.
Authors:WU Guimei
Institution:WU Guimei(Center of Campus Network & Modem Educational Technology,Guangdong University of Technology,Guangzhou 510006,China)
Abstract:Study of the text quickly and accurately clustering problem.In the process of text clustering clustering accuracy and speed of the demand is higher,the same semantically different words with the same lexical expression expression of different semantics of text clustering effect impact.The traditional text clustering method is the text representation of a vector space model,vector space model is from the word occurrence frequency angle structure,when it appeared some polysemy and homonymy occurs when a delay was not higher clustering accuracy.This paper presented a method for text clustering method based on semantic indexing.Text for Keyword extraction structure document-the word matrix,SVD decomposition by optimizing the balance method of matrix code and calculation of similarity,the traditional method to overcome the malpractice.Practice has proved,this method can greatly reduce the synonymy and polysemy in text clustering effect,so that the clustering accuracy and high efficiency,the experimental effect is improved obviously.
Keywords:clustering  balance optimization  latent semantic indexing
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号