首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Doc2Vec的专利文件相似度检测方法的对比研究
引用本文:曹祺,赵伟,张英杰,赵树君,陈亮.基于Doc2Vec的专利文件相似度检测方法的对比研究[J].图书情报工作,2018,62(13):74-81.
作者姓名:曹祺  赵伟  张英杰  赵树君  陈亮
作者单位:1,2,4 中国科学技术信息研究所 北京 100038; 3. 武汉大学 武汉 430072
基金项目:本文系国家自然科学基金青年项目"面向专利文本中实体关系抽取的远程监督方法研究"(项目编号:71704169)和国家自然科学基金青年项目"大数据挖掘在科技项目查重中的应用"(项目编号:71303223)研究成果之一。
摘    要:目的/意义]专利相似度检测(Similarity Measurement)可从宏观上辅助制定国家创新战略规划,发现国内外的热点及应对其他国家的专利流氓,从微观上为专利发明人、专利审查员、专利权人提供辅助支撑。方法/过程]提出基于深度学习的Doc2Vec专利相似度分析方法,基于未进行清洗的专利语料库,采用深度学习的Doc2Vec模型,随机挑选了专利,研究了专利相似度检测问题,并和传统的相似度检测模型进行对比研究。结果/结论]实验结果表明,基于深度学习的Doc2Vec模型和TF-IDF模型对于处理不做数据清洗的专利语料的结果有相近性,该方法对分析人员的专利领域知识要求较低,不需要对专利数据进行基于专利领域知识的数据清洗,同时可为专利侵权、专利查新提供新的智能工具支撑,降低研究门槛和工作量,提升研究效率。

关 键 词:专利  相似度  深度学习  Doc2Vec  
收稿时间:2017-10-16

Comparative Study of Patent Documents Similarity Detection on Deep Learning of Doc2Vec Based Methods
Cao Qi,Zhao Wei,Zhang Yingjie,Zhao Shujun,Chen Liang.Comparative Study of Patent Documents Similarity Detection on Deep Learning of Doc2Vec Based Methods[J].Library and Information Service,2018,62(13):74-81.
Authors:Cao Qi  Zhao Wei  Zhang Yingjie  Zhao Shujun  Chen Liang
Institution:1,2,4 Institute of Scientific and Technical Information of China, Beijing 100038;3.Wuhan University, Wuhan 430072
Abstract:Purpose/significance] Patent similarity detection assists the formulation of the national innovation strategy planning macroscopically, finds hotspots in China and all over the world, and deals with patent rogues in other countries. Microscopically, patent similarity detection provides support for patent inventors, patent examiners and patentees.Method/process] A new method was proposed based on deep learning of Doc2Vec model, with patent corpus based on no data clearance of domain knowledge. Then typical patents were randomly selected to carry on similarity detection by this new method, and the results with traditional similarity detection models were compared.Result/conclusion] According to experimental results, the new deep learning of Doc2Vec method and TFIDF model has similary results which both of the model's patent corpus all based on no data clearance of domain knowledge.The new method requires less professional skill in specific domain knowledge, and didn't require the process of data clearance. It can givesa new intelligent support tool for patent infringement and patent investigation, reduce the research threshold and workload, and improve service efficiency.
Keywords:patent  similarity  deep-learning  Doc2Vec  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号