首页 | 本学科首页   官方微博 | 高级检索  
     检索      

美国商业管制清单与专利自动映射方法及实证研究
引用本文:吕璐成,韩涛,陈芳,王学昭,赵亚娟,郭世杰.美国商业管制清单与专利自动映射方法及实证研究[J].情报学报,2022,41(1):50-61.
作者姓名:吕璐成  韩涛  陈芳  王学昭  赵亚娟  郭世杰
作者单位:中国科学院文献情报中心,北京 100190;中国科学院大学经济与管理学院图书情报与档案管理系,北京 100190;中国科学院文献情报中心,北京 100190
基金项目:中国科学院青年人才项目“基于深度学习的专利所属产业分类”(G180161001)。
摘    要:为了高效分析中美在美国商业管制清单(Commerce Control List,CCL)记录的管制技术上的差距,针对CCL清单数据非结构化程度高的问题,提出了一种管制清单数据和专利数据的自动映射方法,实现了从专利视角自动揭示中美技术差距。基于文本挖掘的思想,研究制定了管制清单文本规范化流程,提出了基于TF-IDF (term frequency-inverse document frequency)和Word2Vec的管制清单数据与专利数据自动映射方法和效果评价指标。以2019年美国商业管制清单和2018年全球PCT (Patent Cooperation Treaty)专利申请数据为例进行实证研究,通过评估模型效果,最终发现当文本相似度阈值为0.87时,Word2Vec模型的自动映射结果最优,并以此开展技术差距分析。本研究提出的方法能够自动化映射管制清单数据和专利数据并开展情报分析,分析结果具有较高的可解释性,是提升情报分析时效性的有力手段,具有较高的实际应用价值。

关 键 词:商业管制清单  专利数据  文本相似  Word2Vec  技术差距

Automatic Mapping Method and Empirical Research of U.S. Commerce Control List Data and Patent Data
Lyu Lucheng,Han Tao,Chen Fang,Wang Xuezhao,Zhao Yajuan,Guo Shijie.Automatic Mapping Method and Empirical Research of U.S. Commerce Control List Data and Patent Data[J].Journal of the China Society for Scientific andTechnical Information,2022,41(1):50-61.
Authors:Lyu Lucheng  Han Tao  Chen Fang  Wang Xuezhao  Zhao Yajuan  Guo Shijie
Institution:(National Science Library,Chinese Academy of Sciences,Beijing 100190;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190)
Abstract:To efficiently reveal the technology gap between China and the United States in technologies recorded in the U.S.Commerce Control List(CCL),under the circumstances of the highly unstructured characteristics of CCL data,this paper proposes an automatic method to map the CCL and patent data,which can automatically reveal the technology gap from the perspective of the patent.Based on the theory of text mining,the text standardization process of CCL text is formulated,and the automatic mapping method and effect evaluation indicator of CCL data and patent data based on TF-IDF and Word2Vec are proposed.Taking the U.S.CCL data in 2019 and the global Patent Cooperation Treaty(PCT)patent application data in 2018 as an example,the empirical research is conducted.By evaluating the effect of the model,it is finally found that the automatic mapping result obtained when the text similarity threshold of Word2Vec model is 0.87 is optimal,and the technology gap analysis is carried out based on this model.The method proposed in this paper can automatically map CCL and patent data and carry out an intelligence analysis,and the analysis results are highly interpretable.It is a useful tool to improve the timeliness of intelligence analysis,and has high practical value.
Keywords:Commerce Control List  patent data  text similarity  Word2Vec  technology gap
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号