首页 | 本学科首页   官方微博 | 高级检索  
     检索      

运用图示法自动提取中文专利文本的语义信息
引用本文:姜春涛.运用图示法自动提取中文专利文本的语义信息[J].图书情报工作,2015,59(21):115-122.
作者姓名:姜春涛
作者单位:南京大学计算机科学与技术系 南京 210023 江苏省专利信息服务中心 南京 210008
摘    要:目的/意义]提出利用图结构的表示法自动挖掘中文专利文本的语义信息,以为基于文本内容的专利智能分析提供语义支持。方法/过程] 设计两种运用图结构的模型:①基于关键词的文本图模型;②基于依存关系树的文本图模型。第一种图模型通过计算关键词之间的相似性关系来定义;第二种图模型则由句中所提取的语法关系来定义。在案例研究中,借助频繁子图挖掘算法,对所建图模型进行子图挖掘, 并构建以子图为特征的文本分类器,用来检测所建图模型的表达性和有效性。结果/结论]将所建的基于图模型的文本分类器应用于4个不同技术领域的专利文本数据集,并与经典文本分类器的测试结果相比较而知:前者在使用明显较少的特征数的基础上,分类性能较后者提升2.1%-10.5%。由此而推断,使用图结构的表达法并结合图挖掘技术从专利文本中所提取的语义信息是有效的,有助于进一步的专利文本分析。

关 键 词:图示法  专利信息提取  频繁子图挖掘  专利分类  
收稿时间:2015-08-20
修稿时间:2015-10-18

Applying Graph Representations to Automatic Extraction of Semantic Information from Chinese Patent text
Jiang Chuntao.Applying Graph Representations to Automatic Extraction of Semantic Information from Chinese Patent text[J].Library and Information Service,2015,59(21):115-122.
Authors:Jiang Chuntao
Institution:Department of Computer Science and Technology, Nanjing University, Nanjing 210023 Patent Information Service Center of Jiangsu Province, Nanjing 210008
Abstract:Purpose/significance]This paper proposes a graph representation based approach to extract automatically semantic information from Chinese patent texts; such information can be used to provide semantic support for text-content based patent intelligent analysis. Method/process]The author devised two graph models using graph representations: ①a keyword based text graph model, ②a dependency tree based text graph model. The first graph model was constructed by computing the similarities between any two keywords; the second graph model was constructed by extracting syntactic relations from text sentences. In the case study, the author utilized a frequent subgraph mining algorithm to discover frequent subgraph patterns, and such patterns were further used as features to build text classifiers for the purpose of testing the expressivity and effectiveness of the graph models built before. Result/conclusion] The constructed text classifiers were tested on datasets consisting of patents from four different technology domains, in comparison with using a classic text classifier. The experimental results show that the performance of two text classifiers using graph models has a gain of 2.1%-10.5% than a classic text classifier by using a smaller number of features. Thus, it can be inferred that employing graph representations and graph mining techniques to extract semantic information from patent texts is effective and facilitates a further patent text analysis.
Keywords:graph representations  patent information extraction  frequent subgraph mining  patent classification  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号