Entity ranking in Wikipedia: utilising categories,links and topic difficulty prediction |
| |
Authors: | Jovan Pehcevski James A Thom Anne-Marie Vercoustre Vladimir Naumovski |
| |
Institution: | (1) Faculty of Informatics, European University, Skopje, Macedonia;(2) School of Computer Science, RMIT University, Melbourne, VIC, Australia;(3) INRIA, Rocquencourt, France;(4) T-Mobile, Skopje, Macedonia |
| |
Abstract: | Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity
extraction where the goal is to tag names of entities in documents, entity ranking is primarily focused on returning a ranked
list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated
on the INEX Wikipedia test collection. In this paper, we describe a system we developed for ranking Wikipedia entities in
answer to a query. The entity ranking approach implemented in our system utilises the known categories, the link structure
of Wikipedia, as well as the link co-occurrences with the entity examples (when provided) to retrieve relevant entities as
answers to the query. We also extend our entity ranking approach by utilising the knowledge of predicted classes of topic
difficulty. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition
to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the
optimal values for the retrieval parameters of our entity ranking system. Our experiments demonstrate that the use of categories
and the link structure of Wikipedia can significantly improve entity ranking effectiveness, and that topic difficulty prediction
is a promising approach that could also be exploited to further improve the entity ranking performance. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|