首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于用户自然标注的TF-IDF辅助标引算法及实证研究
引用本文:陈白雪,宋培彦.基于用户自然标注的TF-IDF辅助标引算法及实证研究[J].图书情报工作,2018,62(1):132-139.
作者姓名:陈白雪  宋培彦
作者单位:中国科学技术信息研究所 北京 100038
基金项目:本文系2016年国家社会科学基金项目"基于知识组织的科研项目评审专家发现研究"(项目编号:16BTQ079)和2017年度中国科学技术信息研究所创新研究基金面上项目"面向国家科技大数据的知识图谱动态构建方法研究"(项目编号:MS2017-06)研究成果之一。
摘    要:目的/意义] 从用户角度出发,研究基于用户自然标注的TF-IDF辅助标引算法。方法/过程] 首先以核心期刊论文中作者标注的关键词和分类号为源数据,通过对关键词词频进行统计,使用TF-IDF算法构建用户标注词表、形成标引知识库,然后通过IK Analyzer分词软件对待标引的科技项目数据进行切词和停用词处理,进而使用TF-IDF算法和位置加权算法提取科技项目数据的特征词,最终实现对科技项目数据进行关键词和分类的同步标引。结果/结论] 实验结果表明,机标关键词与人标关键词的相似比在60%以上的科技项目数据占总数的68.1%,机标分类号与人标分类号前三位一致的占总数的83.9%,结果表明基于用户自然标注数据并采用TF-IDF算法在关键词和分类标引方面是可行的。

关 键 词:辅助标引  用户自然标注  TF-IDF算法  信息组织  
收稿时间:2017-07-10

Empirical Research on TF-IDF Assisted Indexing Algorithm Based on Users' Natural Annotation
Chen Baixue,Song Peiyan.Empirical Research on TF-IDF Assisted Indexing Algorithm Based on Users' Natural Annotation[J].Library and Information Service,2018,62(1):132-139.
Authors:Chen Baixue  Song Peiyan
Institution:Institute of Scientific and Technical Information of China, Beijing 100038
Abstract:Purpose/significance] This paper studies the TF-IDF assisted indexing algorithm based on the user natural annotation from the users' point of view.Method/process] First, the keywords and the classification number in Chinese core journals were taken as the data source. The user natural annotation vocabulary was constructed by computing the keywords frequency and using the TF-IDF algorithm. Second, the featured words were extracted from the scientific and technological project data by the IK Analyzer word segmentation software and the TF-IDF algorithm. Finally, the keywords and classification number of the scientific and technological project data were indexed synchronously.Result/conclusion] The experiment indicates that the data of scientific and technical projects take up 68.1% in total. In these projects, the ratio similitude of the keywords of machine indexing and the keywords of human indexing is more than 60% in total. The ratio of the uniformity in the former three numbers of machine-indexed classification number and the human-indexed classification number is 83.9% in total. It is feasible to adopt the TF-IDF algorithm based on the users' natural annotation data.
Keywords:assisted indexing  user natural annotation  TF-IDF algorithm  information organization  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号