首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.  相似文献   

2.
Author disambiguation resolves same-name author occurrences in the bibliographic data into namesakes. This enables author-centered searches and high-quality social network analysis. As an attempt to promote much research in author disambiguation, KISTI have constructed a new large-scale test set for this field. This article describes its semi-manual creation procedures, characteristics especially in terms of author ambiguities and name diversities. In addition, the baseline performance of author clustering against the test set is provided.  相似文献   

3.
文献著者消歧是人名消歧的一种,近年来引起了学术界的广泛关注。其中,文献聚类方法是文献著者消歧的重要方法,但其实验效果往往不佳。基于此,对文本聚类K—means方法进行改进,并在此基础上来实现文献著者消歧。实验结果表明,改进的K—means算法能有效提高文献著者消歧的实验效果。  相似文献   

4.
【目的/意义】为解决重名作者姓名识别问题,提升作者姓名消歧准确率。【方法/过程】本文着重在整合作 者单位、邮箱等信息特征的基础上抓住作者在研究方向和研究内容上的承接性和演进性,提出构建综合文章题目、 关键词、摘要、引文以及作者的合作列表、邮箱、机构等附属信息的作者语料集,利用Doc2ve进行深度本文表示学 习,在特征学习的基础上利用支持向量机(SVM)根据人工标注的样本进行模型训练和学习,以 PubMed Central (PMC)全部数据为例,在得到局部较优结果的基础上,将模型用于PMC所有数据集。【结果/结论】结果显示本文提 出的姓名消歧方法准确率达91.80%,有效提升了消歧的准确率,该方法不仅把握了传统作者机构、邮箱、合作列表 等特征信息,而且根据作者研究内容的承接性和演进性追溯作者,整合多方面特征以解决单单依据单位、邮箱等信 息消歧失效问题,面对学者流动性的增强展示出其更强的应用前景。【创新/局限】本研究将每个作者分别包装成一 个个文档,以此包含作者的所有属性以及相关信息,通过无监督文本表示学习和有监督机器学习结合的模式完成 消歧任务,在生命科学与医学领域数据方面具有较好的适用性。  相似文献   

5.
曹霞  崔雷 《现代情报》2016,36(3):129-134
以JCR2014年收录的医学信息学领域的7种核心期刊为数据来源, 利用书目共现分析系统——BICOMB软件生成作者共现矩阵, 运用UCINET软件实现高产作者合著网络的可视化, 分析合著网络的密度、平均距离、凝聚子群、核心-边缘结构以及中心度, 揭示国外医学信息学领域合著网络整体结构特性、核心学术团体和高产作者之间的合作情况。研究结果表明, 国外医学信息学领域高产作者合著网络整体信息交流不通畅, 作者合作范围不广泛, 合作模式单一, 缺乏能够连接不同合著群体的桥梁作者。  相似文献   

6.
国际图书情报领域作者、机构和国家合著网络剖析   总被引:1,自引:0,他引:1  
曹霞  崔雷  黄鹏 《现代情报》2017,37(1):142
借助SPSS软件从2015年JCR收录的图书情报学领域的86种核心期刊中,随机选取20种期刊作为数据来源,利用书目共现分析系统——BICOMB软件生成共现矩阵,运用UCINET和Pajek软件实现作者、机构和国家3个合著网络的可视化,分析各合著网络的聚类系数、平均距离、高频合作群体,检验小世界特性,揭示网络整体结构、核心学术群体和高频合著集团之间的关系。结果表明国际图书情报领域发文量、合著率及合著规模基本逐年上涨,合作科研已成为不可逆转的大趋势,3个合著网络均具聚类系数大,平均距离短的特点,符合复杂网络的小世界理论,说明国际图书情报领域整体合著网络具有较高的连通性、内部交流频繁、信息传递畅通。高频合著作者和高频合作机构有着确定的研究方向,且倾向于将其连续性的科研成果发表于同一期刊。  相似文献   

7.
Research into invention, innovation policy, and technology strategy can greatly benefit from an accurate understanding of inventor careers. The United States Patent and Trademark Office does not provide unique inventor identifiers, however, making large-scale studies challenging. Many scholars of innovation have implemented ad-hoc disambiguation methods based on string similarity thresholds and string comparison matching; such methods have been shown to be vulnerable to a number of problems that can adversely affect research results. The authors address this issue contributing (1) an application of the Author-ity disambiguation approach (0170 and 0175) to the US utility patent database, (2) a new iterative blocking scheme that expands the match space of this algorithm while maintaining scalability, (3) a public posting of the algorithm and code, and (4) a public posting of the results of the algorithm in the form of a database of inventors and their associated patents. The paper provides an overview of the disambiguation method, assesses its accuracy, and calculates network measures based on co-authorship and collaboration variables. It illustrates the potential for large-scale innovation studies across time and space with visualizations of inventor mobility across the United States. The complete input and results data from the original disambiguation are available at (http://dvn.iq.harvard.edu/dvn/dv/patent); revised data described here are at (http://funglab.berkeley.edu/pub/disamb_no_postpolishing.csv); original and revised code is available at (https://github.com/funginstitute/disambiguator); visualizations of inventor mobility are at (http://funglab.berkeley.edu/mobility/).  相似文献   

8.
昌宁  窦永香  徐薇 《情报科学》2021,39(6):108-116
【目的/意义】本文利用多源数据,通过对科技文献作者的名称进行消歧,使作者与科技文献呈一一对应的 关系。【方法/过程】本文提出首先将采集的多源数据进行预处理,形成了同一姓名作者文献组成的待消解的重名数 据集,通过合作关系构建学术圈以发现歧义,最后通过机构和领域进行消歧。【结果/结论】实验采集了各级教育、自 动化及计算机技术、信息与知识传播、数理科学和化学、无线电电子学、中国医学等6个不同的学科的文献题录数 据,本文提出的基于规则的消歧具有良好的消歧效果。通过多源数据融合、机构和领域多指标消歧,能够达到较高 的消歧效果。【创新/局限】解决了同机构同领域消歧的难题,并考虑了增量问题,构建了完整的消歧模型。  相似文献   

9.
The existing credit allocation method of coauthored research paper could not tell the whole story about who did what and the acknowledgment of different parts of the article. When an article is cited, the first author often gets the primary or even full credit, even if the citing paper cites the method part of the article, which is mainly contributed by the second author. This study proposes a context-based author credit (CAC) model to allocate individual credit to coauthors in a multi-authored paper. In the proposed model, coauthor's credit is conceptualized as a directed and weighted connection between citations and contributor roles, where the relationship was decided by citation context. Citation strength was used in the proposed model instead of the number of citing papers which can make the credit of research more precise. The proposed approach can complement existing measures of author credit analysis based on author signature order. In our experiments, the model was validated by fitting to empirical data, a group of highly productive authors’ articles and their citing papers, from PLOS Medicine. The results show that CAC model outperforms prior alternatives such as normal, fractional, harmonic counting and author contribution solely based on contribution list in terms of reflecting the specific performance of coauthors. Besides, the CAC model has a certain sensitivity to the contributions of lower-ranked authors, breaking through the restriction of the author's signature order. This paper also provides the new application of this model in author academic evaluation.  相似文献   

10.
我国情报学科研合著网络研究及其特征参数分析   总被引:1,自引:0,他引:1  
本文通过采集2001-2007年我国图书情报学领域17种核心期刊所收录的合著数据,建立和研究了我国情报学科研合著关系网络,并对该网络的特征参数进行了计算,对作者影响力进行了分析.研究结果表明,情报学网络整体合作情况不够理想,并且网络中具有很强影响力的作者比较少,需要从培养具有强影响力的作者和加强团队之间相互合作两方面共同努力.  相似文献   

11.
中德海洋学核心期刊论文国际合著的比较分析   总被引:1,自引:0,他引:1  
万琪  华薇娜 《现代情报》2016,36(2):150-156
本研究选取Web of Science数据库,收集中国和德国发表的海洋学论文,将论文分为国际合著组与非国际合著组,利用文献计量学的方法,借助EXCEL、SPSS和UCINET软件,比较两国各年的发文量、论文的篇均作者和篇均参考文献数、收录期刊的影响因子、论文的被引频次以及合著国家。研究结果显示,中国海洋学领域的国际合著起步晚于德国,但发展迅速,近年来中国海洋学国际合著的论文量已超过德国;同时中国海洋学国际合著论文的质量和学术影响力都弱于德国,并且中国海洋学论文合著国家的范围也小于德国海洋学的合著国家。  相似文献   

12.
姚啸华 《现代情报》2011,31(7):51-54
本文通过社会网络分析方法,针对图情机构内部学者合著网络从中心性分析、凝聚子群分析和可视化分析3个角度,以武汉大学信息管理学院为例进行实证研究。根据实证结果,找出学院内部核心作者、合著网络子群并对图情机构内部学者合著网络特性进行了概括分析。  相似文献   

13.
姜鑫  马海群 《现代情报》2016,36(4):170-177
本文以CSSCI收录的18种图书情报学期刊发表于2014年的4407篇期刊论文作为数据来源,通过共词聚类分析、战略坐标分析、作者合作分析和作者-关键词耦合分析方法探讨了2014年我国图书情报学领域的研究进展。本文通过共词聚类分析和战略坐标分析确定了2014年我国图书情报学研究的17个重要主题及其演变趋势,通过作者合作分析确定了2014年我国图书情报学领域的科学合作状况及主要合作团体,通过作者-关键词耦合分析确定了高产作者的主要研究领域。  相似文献   

14.
Author co-citation analysis (ACA) has been widely used in bibliometrics as an analytical method in analyzing the intellectual structure of science studies. It can be used to identify authors from the same or similar research fields. However, such analysis method relies heavily on statistical tools to perform the analysis and requires human interpretation. Web Citation Database is a data warehouse used for storing citation indices of Web publications. In this paper, we propose a mining process to automate the ACA based on the Web Citation Database. The mining process uses agglomerative hierarchical clustering (AHC) as the mining technique for author clustering and multidimensional scaling (MDS) for displaying author cluster maps. The clustering results and author cluster map have been incorporated into a citation-based retrieval system known as PubSearch to support author retrieval of Web publications.  相似文献   

15.
[目的/意义]作者相似度研究是探测学科知识结构、挖掘潜在合作关系的重要基础。[方法/过程]本文拟构建作者的关键词—文献2模矩阵,通过研究矩阵对应的灰度图像之间的相似性来判定作者相似性,并利用灰色关联理论计算基于合著关系和基于关键词的作者相似度,最后进行实证分析,对比三种作者相似性结果。[结果/结论]实验表明:基于关键词—文献矩阵的作者相似度研究可以减少因作者发文量不同及高频关键词的使用带来的影响,同时可以放大作者间研究内容的差异性与相似性,能较为准确地反映作者的相似度。  相似文献   

16.
Co-authorship networks in the digital library research community   总被引:5,自引:0,他引:5  
The field of digital libraries (DLs) coalesced in 1994: the first digital library conferences were held that year, awareness of the World Wide Web was accelerating, and the National Science Foundation awarded $24 Million (US) for the Digital Library Initiative (DLI). In this paper we examine the state of the DL domain after a decade of activity by applying social network analysis to the co-authorship network of the past ACM, IEEE, and joint ACM/IEEE digital library conferences. We base our analysis on a common binary undirectional network model to represent the co-authorship network, and from it we extract several established network measures. We also introduce a weighted directional network model to represent the co-authorship network, for which we define AuthorRank as an indicator of the impact of an individual author in the network. The results are validated against conference program committee members in the same period. The results show clear advantages of PageRank and AuthorRank over degree, closeness and betweenness centrality metrics. We also investigate the amount and nature of international participation in Joint Conference on Digital Libraries (JCDL).  相似文献   

17.
中国情报学期刊论文合著现象研究与思考   总被引:22,自引:2,他引:22  
汪冰 《情报科学》1992,13(2):8-16
本文统计分析了《情报学报》、《情报科学》等10种情报学期刊论文的合著现象,较为深入地研究了论文合著率、论文合著者类型、合著强度、不同作者数的论文分布情况、文献单元作者系数及其变化特征等。文章在大量数据的基础上,进行了中外情报学论文合著的比较研究,还对合著现象进行一些深层思考,认为加强合作研究、发挥集团效应必将促进情报学在我国的发展。  相似文献   

18.
[目的/意义]为弥补现有作者影响力评价指标缺乏内容信息的不足,发现不同研究主题下高影响力的作者,文章给出一种基于主题内容的作者影响力评价方法。[方法/过程]以情报学领域近5年核心期刊的文献为样本,首先利用CTM模型提取样本文献的主题,获得文献作者对不同主题的贡献值;再利用K-means算法对样本文献分类,由此将文献对应的作者划分到特定主题类别下;然后,将作者在某特定主题类别的贡献值与作者发表文献的平均被引频次相结合,设计特定主题类别下作者影响力指标(Author Influence Index in Specific Topic,AII-ST);最后,根据AII-ST值对作者进行影响力排序。[结果/结论]本研究在方法上,通过CTM模型与K-means算法的结合实现了K-means算法初始聚类中心与聚类数目的双重优化;在应用中,作者评价指标AII-ST值能有效限定作者的比较范围,较好地反映作者的研究方向;新指标评价视角新颖、评价结果可靠。  相似文献   

19.
Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved document list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries.  相似文献   

20.
我国情报学科研合著网络特性与集团结构分析   总被引:1,自引:1,他引:0  
本文以我国情报学科研合著网络为实例,对网络的特性和集团结构特点进行了深入分析.分析结果表明,我国情报学科研合著网络具有小世界特性和无标度特性;网络中各集团内部连接紧密,但集团之间连接比较松散,网络的连通度比较低.为了改进目前的合著状况,需要加强各集团之间的相互合作.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号