Finding similar questions in large question and answer archives

TLDR

Community‑based Q&A services on the Web have grown rapidly, creating large archives that serve as valuable linguistic resources and require efficient methods to locate semantically similar questions for timely, high‑quality answer retrieval. The paper proposes to use similarity between answers in the archive to estimate probabilities for a translation‑based retrieval model. This method computes translation probabilities by measuring answer similarity within the archive to guide question retrieval. The model successfully retrieves semantically similar questions with relatively little word overlap.

Abstract

There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples' questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap.

References

Page 1

	Year	Citations

Page 1