首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Classifying documents with link-based bibliometric measures
Authors:T Couto  N Ziviani  P Calado  M Cristo  M Gonçalves  E S de Moura  W Brandão
Institution:(1) Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil;(2) IST/INESC-ID, Lisbon, Portugal;(3) FUCAPI-Analysis, Research and Tech. Innovation Center, Manaus, Brazil;(4) Department of Computer Science, Federal University of Amazonas, Manaus, Brazil
Abstract:Automatic document classification can be used to organize documents in a digital library, construct on-line directories, improve the precision of web searching, or help the interactions between user and search engines. In this paper we explore how linkage information inherent to different document collections can be used to enhance the effectiveness of classification algorithms. We have experimented with three link-based bibliometric measures, co-citation, bibliographic coupling and Amsler, on three different document collections: a digital library of computer science papers, a web directory and an on-line encyclopedia. Results show that both hyperlink and citation information can be used to learn reliable and effective classifiers based on a kNN classifier. In one of the test collections used, we obtained improvements of up to 69.8% of macro-averaged F 1 over the traditional text-based kNN classifier, considered as the baseline measure in our experiments. We also present alternative ways of combining bibliometric based classifiers with text based classifiers. Finally, we conducted studies to analyze the situation in which the bibliometric-based classifiers failed and show that in such cases it is hard to reach consensus regarding the correct classes, even for human judges.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号