首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于SMOTE-LOF-Adaboost模型的核心专利识别研究
引用本文:李颖,吴增源,陈 亮.基于SMOTE-LOF-Adaboost模型的核心专利识别研究[J].科技管理研究,2023(21):171-177.
作者姓名:李颖  吴增源  陈 亮
作者单位:1. 中国计量大学经济与管理学院;2. 中国计量大学光学与电子科技学院
基金项目:浙江省重点研发计划项目“基于互联网的新材料发光产业链检测关键技术研究与开发”(2021C01027);;浙江省自然科学基金项目“基于知识开放的众创式创新社区集体智慧涌现的机制研究”(LY20G01008);
摘    要:针对核心专利识别准确率低的问题,重构指标体系;针对传统核心专利识别方法处理不平衡数据效果欠佳,提出重采样技术与集成算法的组合模型。首先,在传统指标构建基础上加入专利发明人相关指标;其次,使用合成少数类过采样算法(SMOTE)增加少数类样本解决数据不平衡问题,采用局部离群因子(LOF)算法对新生成样本进行降噪处理,并与自适应集成算法(Adaboost)组合成SMOTE-LOF-Adaboost模型;最后,以智慧芽专利数据库中2012—2016年共22077条光伏领域专利数据为例,使用SVM、Adaboost、SMOTE-Adaboost、SMOTE-LOFAdaboost进行实证分析。结果显示SMOTE-LOF-Adaboost模型AUC均值0.977 6,Recall均值0.986 0,均优于其他3种模型,表明该模型能够提高核心专利预测的准确性。

关 键 词:组合模型  核心专利识别  机器学习
收稿时间:2023/5/26 0:00:00
修稿时间:2023/6/8 0:00:00

Research on Identification of Core Patents Based on SMOTE-LOF-Adaboost Model
Abstract:To address the issue of low accuracy in identifying core patents, the indicator system was reconstructed. To address the problem of the traditional core patent identification method"s poor performance in handling imbalanced data, a combined model of resampling techniques and ensemble algorithms was proposed. First, patent inventors" relevant indicators were added to the traditional indicator construction foundation. Second, the Synthetic Minority Over-sampling Technique (SMOTE) algorithm was used to increase the number of minority samples to solve the data imbalance problem. Then, the Local Outlier Factor (LOF) algorithm was used to denoise the newly generated samples, and combined with the Adaptive Boosting (Adaboost) algorithm to form the SMOTE-LOF-Adaboost model. Finally, taking the 22,077 photovoltaic field patent data from the Patsnap patent database from 2012 to 2016 as an example, SVM, Adaboost, SMOTE-Adaboost, and SMOTE-LOF-Adaboost were used for empirical analysis. The results showed that the SMOTE-LOF-Adaboost model had a mean AUC of 0.9776, a mean Recall of 0.9860, and a mean F1 score of 0.9607, which were superior to the other three models, and the standard deviation of each indicator was smaller. This indicates that the SMOTE-LOF-Adaboost model not only improves the accuracyof core patent prediction but also has higher model stability.
Keywords:combined model  core patent identification  machine learning
点击此处可从《科技管理研究》浏览原始摘要信息
点击此处可从《科技管理研究》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号