首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于条件随机场的专利术语抽取
引用本文:刘辉,刘耀.基于条件随机场的专利术语抽取[J].数字图书馆论坛,2014(12):46-49.
作者姓名:刘辉  刘耀
作者单位:中国科学技术信息研究所,北京100038
基金项目:本研究得到“十二五”国家科技支撑计划项目“专利信息资源挖掘与发现关键技术研究”(编号:2013BAH21B02)资助.
摘    要:专利术语抽取是专利文献信息抽取领域的一项重要任务,有助于专利领域词表的构建,有利于中文分词、句法分析、语法分析等工作的进行。文章通过分析专利术语的特点并制定相应的语料标注规则进行人工标注,采用条件随机场(conditional random fields,CRFs)对标注后的数据进行训练和测试,实现了通信领域的术语抽取。标注方法采用基于字的序列标注,精确率、召回率和F值分别达到80.9%、75.6%、78.2%,优于将词和词性等信息作为特征的方法,表明所提出的专利术语抽取方法是有效的。

关 键 词:条件随机场  术语抽取  序列标注

Patent Term Extraction Based on Conditional Random Fields
LIU Hui,LIU Yao.Patent Term Extraction Based on Conditional Random Fields[J].Digital Library Forum,2014(12):46-49.
Authors:LIU Hui  LIU Yao
Institution:(Institute of Scientific and Technical Information of China, Beijing 100038, China)
Abstract:Patent term extraction is an important task in patent information extraction, which benefits the construction of lexicography, the work of word segmentation, and parsing. Corpus is labeled manual y with corresponding rules writ en by analyzing the characteristics of patent terms. CRFs (Conditional Random Fields) is adapted to train and test labeled data. Sequence labeling is based on single Chinese characters. Experimental results show that the precision, recal and F-score are 80.12%, 74.2%and 76.9%respectively, which are superior to methods based on sequence labeling of words. Results il ustrates that the established model for extracting patent term is effective.
Keywords:Conditional random fields  Term extraction  Sequence labeling
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号