首页 | 本学科首页   官方微博 | 高级检索  
     检索      

项目申请书摘要文本的语步识别语料构建
引用本文:赵旸,张智雄,李婕.项目申请书摘要文本的语步识别语料构建[J].图书情报工作,2022,66(21):97-106.
作者姓名:赵旸  张智雄  李婕
作者单位:1.中国科学院文献情报中心 北京 100190;2.中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
基金项目:本文系中国科学院文献情报能力建设专项子项目"基于科技文献知识的人工智能(AI)引擎建设"(项目编号:E0290906)研究成果之一。
摘    要:目的/意义] 自动识别项目申请书摘要中的科学要素,对于揭示科技项目中的科学知识具有重要的研究意义。这些科学要素的识别依赖于结构化项目摘要文本,然而目前结构化项目摘要语料资源匮乏,严重制约着相关研究的进一步发展。拟构建项目申请书摘要文本的语步语料集,为相关研究提供数据支撑。方法/过程] 首先将项目摘要内容归纳为背景及问题、目标及任务、方法内容、价值意义4种语步类型,总结每个语步结构中出现的标志性特征并制定语步标注规范;其次相继利用基于规则和基于深度学习的方法辅助人工进行项目摘要的语步结构标注,并对每轮标注后的语料进行质量评估。结果/结论] 两种方法共计标注近25 000条语句,语料标注的一致性系数达到0.983 9,表明该语料集基本能够区分项目摘要内的不同语步结构,初步达到了语料库建设的基本要求。

关 键 词:语步识别  项目申请摘要文本  语步语料集构建  迭代标注  
收稿时间:2022-03-29
修稿时间:2022-08-13

The Construction of Move Recognition Corpus for Project Application Abstract
Zhao Yang,Zhang Zhixiong,Li Jie.The Construction of Move Recognition Corpus for Project Application Abstract[J].Library and Information Service,2022,66(21):97-106.
Authors:Zhao Yang  Zhang Zhixiong  Li Jie
Institution:1.National Science Library, Chinese Academy of Sciences, Beijing 100190;2.Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190
Abstract:Purpose/Significance] Automatic recognition of scientific elements in project application abstracts is of great research significance for revealing scientific knowledge in science and technology projects. The recognition of these scientific elements relies on structured project abstract texts. However, the current lack of structured corpus resources for project abstract seriously restricts the further development of related research. Therefore, this paper intends to construct a move corpus of the project application abstract to provide data support for related research.Method/Process] First, the project abstracts were summarized into four types of moves:background and problem, objective and task, methodological content, value and significance, then this paper summarized the iconic features that appear in the structure of each move and formulate a move annotation specification. Second, this study successively used rule-based and deep learning-based methods to assist in manual move structure annotation of project abstracts, and evaluate the quality of each round of annotated corpus.Result/Conclusion] The two methods have annotated nearly 25,000 sentences, and the consistency coefficient of the corpus annotation reached 0.9839, which indicating that the corpus can basically distinguish different move structures among project abstracts and initially meet the basic requirements for corpus construction.
Keywords:move recognition  project application abstract  move corpus construction  iterative annotation  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号