Automatic question answering using the web: Beyond the Factoid |
| |
Authors: | Radu Soricut Eric Brill |
| |
Institution: | (1) Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del key, CA 90292, USA;(2) Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA |
| |
Abstract: | In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our
approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be
provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which
we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to
train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed
question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation
model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language
model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular
fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA
system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety
of complex, non-factoid questions. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|