Jimmy Lin and Boris Katz
MIT Artificial Intelligence Laboratory
Question answering systems have become increasingly popular because they deliver users short, succinct answers instead of overloading them with a large number of irrelevant documents. The vast amount of information readily available on the World Wide Web presents new opportunities and challenges for question answering. Question answering systems should benefit from the tremendous amount of useful knowledge, and at the same time, they must cope with large volumes of useless data.
Many characteristics of the World Wide Web distinguish Web-based question answering from question answering on closed corpora such as newspaper texts. The Web is vastly larger in size and boasts incredible "data redundancy," which is amenable to data mining approaches for answer extraction. Surprisingly, a data-driven approach utilizing statistical techniques can yield high levels of performance and nicely complement traditional linguistically-informed question answering technology. The Web also contains pockets of structured and semistructured knowledge that can serve as a valuable resource for question answering. By organizing these resources and annotating them with natural language, we can successfully incorporate Web knowledge sources into question answering systems.
This tutorial will survey recent Web-based question answering technology, focusing on two separate paradigms: knowledge mining using statistical tools and knowledge annotation using database concepts. Both approaches can employ a wide spectrum of techniques ranging in linguistic sophistication from simple "bag-of-words" treatments to full syntactic parsing.