Exploiting redundancy in question answering

Abstract

Our goal is to automatically answer brief factual questions of the form ``When was the Battle of Hastings?'' or ``Who wrote The Wind in the Willows?''. Since the answer to nearly any such question can now be found somewhere on the Web, the problem reduces to finding potential answers in large volumes of data and validating their accuracy. We apply a method for arbitrary passage retrieval to the first half of the problem and demonstrate that answer redundancy can be used to address the second half. The success of our approach depends on the idea that the volume of available Web data is large enough to supply the answer to most factual questions multiple times and in multiple contexts. A query is generated from a question and this query is used to select short passages that may contain the answer from a large collection of Web data. These passages are analyzed to identify candidate answers. The frequency of these candidates within the passages is used to ``vote'' for the most likely answer. The approach is experimentally tested on questions taken from the TREC-9 question-answering test collection. As an additional demonstration, the approach is extended to answer multiple choice trivia questions of the form typically asked in trivia quizzes and television game shows.

References

Page 1

	Year	Citations
The anatomy of a large-scale hypertextual Web search engine Sergey Brin, Lawrence M. Page Computer Networks and ISDN Systems Search TechnologySearch Engine OptimizationEngineeringInformation RetrievalData Science	1998	15.8K
Building a question answering test collection Ellen M. Voorhees, Dawn M. Tice EngineeringCorpus LinguisticsText MiningNatural Language ProcessingInformation Retrieval	2000	445
FALCON: Boosting Knowledge for Answer Engines Sanda M. Harabagiu, Dan Moldovan, Marius Paşca, University of North Texas Digital Library (University of North Texas) EngineeringBoosting KnowledgeSemanticsSemantic WebCorpus Linguistics	2000	249
Question Answering in Webclopedia. Eduard Hovy, Laurie Gerber, Ulf Hermjakob,	2000	246
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web Xiaolan Zhu, Susan Gauch Ranking AlgorithmEngineeringCollaborative Information RetrievalIntelligent Information RetrievalSemantic Web	2000	222
Question-answering by predictive annotation John Prager, Eric W. Brown, Anni Coden, Natural Language ProcessingPredictive AnnotationEngineeringInformation RetrievalQuestion Answering	2000	214
IBM's Statistical Question Answering System. Abraham Ittycheriah, Martin Franz, Wei-Jing Zhu, Text REtrieval Conference Natural Language ProcessingEngineeringInformation RetrievalData ScienceQuestion Answering	2000	175
Results and challenges in Web search evaluation David Hawking, Nick Craswell, Paul B. Thistlewaite, Computer Networks Web Search EvaluationSearch TechnologyEngineeringInformation RetrievalData Science	1999	159
Relevance ranking for one to three term queries Charles L. A. Clarke, Gordon V. Cormack, Elizabeth A. Tudhope Information Processing & Management Term QueriesInformation RetrievalData ScienceData MiningEngineering	2000	149
MURAX Julian Kupiec Applied LinguisticsNatural Language ProcessingClosed-class QuestionsEngineeringGeneral-knowledge Questions	1993	141

Page 1