首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于图挖掘的文本主题识别方法研究综述
引用本文:郭红梅,张智雄.基于图挖掘的文本主题识别方法研究综述[J].中国图书馆学报,2015,41(6):97-108.
作者姓名:郭红梅  张智雄
作者单位:中国科学院 北京 100190,中国科学院 北京 100190
基金项目:本文系国家自然科学基金项目“基于语言网络的文本主题中心度计算方法研究”(编号:61075047)的研究成果之一
摘    要:本文通过文献调研分析,将基于图挖掘的文本主题识别方法总结为中心度方法、紧密关联子图查找和图聚类三种,后两者又细分为基于clique子团或类clique子团、基于图拓扑结构或结点属性聚类的方法。中心度方法通过对比文本网络中术语结点的重要度来实现文本主题的识别,紧密关联子图查找和图聚类方法则是根据文本图中术语结点和边的属性相似度来识别文本核心主题。基于语言文本网络自身特性,如何构建复杂文本关系图来同时揭示术语间的句法、共现和语义关系,如何基于术语关联和图拓扑结构识别其中的紧密关联子团,基于何种标准将紧密关联子团聚类以揭示文本核心主题,都是未来需要进一步深入研究的问题。表1。

关 键 词:文本主题识别  图挖掘  中心度  Clique子团
收稿时间:2015/7/31 0:00:00

Methods of Text Theme Identification Based on Graph Mining
GUO Hongmei and ZHANG Zhixiong.Methods of Text Theme Identification Based on Graph Mining[J].Journal of Library Science In China,2015,41(6):97-108.
Authors:GUO Hongmei and ZHANG Zhixiong
Abstract:With the development of the internet, electronic text is booming. These text resources, especially scientific journal papers, contain rich semantic and linked information. How to demonstrate the core topics quickly and accurately to assist researchers and improve research efficiency has been an urgent issue in text mining. Nodes and edges of graph can represent terms and their relations of texts, so many researchers tried to combine graph mining with natural language processing to identify text theme. This paper investigated and analyzed the studies and summarized their advantages and disadvantages in order to provide a reference for further research.
At present, the studies focus on textual representation of relation graph, theme identification based on centrality and subgraph detection or clustering. The method of theme identification based on cohesive subgraph detection mainly is to recognize clique or quasi clique subgraph to represent the core content of the texts. Theme identification based on graph mining uses two methods: one is according to the graph topological structure, and the other considers graph topological structure and node attributes simultaneously. We mainly analyzed the clustering model, algorithm and evaluation criterion of clustering result. The methods of frequency statistics and external dictionary are relatively mature and often used as benchmark. Centrality methods have been greatly improved, but the algorithm efficiency still needs to be improved. The methods based on graph mining have already shown advantages and are worth deeper exploration.
Language network of text has its unique characteristics. Various relations exist between terms, for example, co-occurrence relation, syntactic relation and semantic relation. How to construct complex text network which can reveal the relations of terms at the same time is one of the research directions in the future. Further studies need to address how to identify cohesive subgraph in complex text network according to relations between terms and topological structure of graph. In addition, the measure according to which these subgraphs are clustered to reveal core sub-themes and the relations of themes in texts also needs to be discussed. 1 tab. 50 refs.
Keywords:Text theme identification  Graph mining  Centrality  Clique sub-group
点击此处可从《中国图书馆学报》浏览原始摘要信息
点击此处可从《中国图书馆学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号