首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于引文全文本的医学领域突破性文献识别研究
引用本文:王雪,杨雪梅,林紫洛,关陟昊,唐小利.基于引文全文本的医学领域突破性文献识别研究[J].情报杂志,2021(3):132-138.
作者姓名:王雪  杨雪梅  林紫洛  关陟昊  唐小利
作者单位:北京协和医学院/中国医学科学院医学信息研究所
基金项目:中国医学科学院医学与健康科技创新工程基金项目“医学科技创新评价与卫生服务体系构建研究”(编号:2016-I2M-3-018);国家科技图书文献中心专项项目“下一代开放知识服务平台关键技术优化集成与系统研发”(编号:2020XM05)的研究成果之一。
摘    要:目的/意义]从学术共同体的评论性引用视角出发,以引文全文本为基础,结合词频统计、深度学习等方法,探析引文文本中表征突破性评价的文本特征并构建自动识别模型以实现从海量文献中识别潜在突破性文献。方法/过程]以诺贝尔生理学或医学奖获得者的关键文献以及Science十大科学突破主题的代表文献(医学领域)作为金标准突破性文献集并获取引用语句,对引用语句进行词频统计并结合人工筛选获取表征突破性评价的常用词。对引用语句进行人工标注,利用BERT、BIOBERT模型进行训练形成自动识别模型,并选择癌症领域进行实证分析。结果/结论]结果表明,学术共同体在评价具有重大突破价值的文献时具有明显的文本特征;相较BERT模型,生物医学语言表示模型BIOBERT对突破性评价引用语句的识别能力明显增强,F1值为0.84。基于引用语句的自动识别模型能够较为精准地识别具有重要学术价值的文献并能在一定程度上实现早期识别和早期评价。

关 键 词:引文全文本  深度学习  突破性文献  自动识别  文本分类  文本特征

Research on Identification of Medical Breakthrough Articles Based on Citing Sentences
Wang Xue,Yang Xuemei,Lin Ziluo,Guan Zhihao,Tang Xiaoli.Research on Identification of Medical Breakthrough Articles Based on Citing Sentences[J].Journal of Information,2021(3):132-138.
Authors:Wang Xue  Yang Xuemei  Lin Ziluo  Guan Zhihao  Tang Xiaoli
Institution:(Peking Union Medical College/Chinese Academy of Medical Sciences, Beijing 100005)
Abstract:Purpose/Significance]From the citing perspective of the academic community,based on the citing sentences,and word frequency statistics,deep learning,etc.,the paper explores the text features that represent breakthrough evaluations in the citing sentences and builds an automatic recognition model to identify potential breakthrough articles.Method/Process]The authors selected the key publications of the Nobel Prize winners in Physiology or Medicine and the representative articles of Science Breakthrough of the Year as the gold standard breakthrough articles collection and obtained their citation sentences.Word frequency statistics combined with manual screening were carried out to obtain common words that characterize breakthrough evaluation.The authors manually labeled the citing sentences and used BERT and BIOBERT models for training to form automatic recognition models,and finally selected the cancer field for empirical analysis.Result/Conclusion]The results show that there is obvious textual characteristics when evaluating literatures with great breakthrough value.Compared with the BERT model,the recognition ability of BIOBERT model was improved,with F1 value of 0.84.The automatic recognition model based on citing sentences can accurately identify the literature with important academic value and realize early recognition and early evaluation to a certain extent.
Keywords:citing sentences  deep learning  breakthrough articles  automatic identification  text classification  textual characteristics
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号