首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Methods for automatically evaluating answers to complex questions
Authors:Jimmy Lin  Dina Demner-Fushman
Institution:(1) College of Information Studies, University of Maryland, MD College Park, 20742, USA;(2) Department of Computer Science, University of Maryland, MD College Park, 20742, USA;(3) Institute for Advanced Computer Studies, University of Maryland, MD College Park, 20742, USA
Abstract:Evaluation is a major driving force in advancing the state of the art in language technologies. In particular, methods for automatically assessing the quality of machine output is the preferred method for measuring progress, provided that these metrics have been validated against human judgments. Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, an automatic technique for evaluating answers to complex questions based on n-gram co-occurrences between machine output and a human-generated answer key. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information “nugget” appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003, TREC 2004, and TREC 2005 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
Contact InformationDina Demner-FushmanEmail:
Keywords:Question answering  Evaluation
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号