A Markov random field model for term dependencies

TLDR

The paper introduces a general Markov random field framework for modeling term dependencies. The framework incorporates arbitrary text features—single terms, ordered and unordered phrases—and explores independence, sequential, and full dependence variants, trained by directly maximizing mean average precision rather than likelihood, and evaluated on newswire and web collections such as GOV2. Modeling dependencies yields significant performance gains, especially on large web collections.

Abstract

This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on occurrences of single terms, ordered phrases, and unordered phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel approach is developed to train the model that directly maximizes the mean average precision rather than maximizing the likelihood of the training data. Ad hoc retrieval experiments are presented on several newswire and web collections, including the GOV2 collection used at the TREC 2004 Terabyte Track. The results show significant improvements are possible by modeling dependencies, especially on the larger web collections.

References

Page 1

	Year	Citations

Page 1