Publication | Closed Access
A Markov random field model for term dependencies
844
Citations
28
References
2005
Year
Unknown Venue
EngineeringIntelligent Information RetrievalDependency LinguisticsSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceHidden Markov ModelComputational LinguisticsNamed-entity RecognitionStatisticsMarkov Random FieldsKnowledge DiscoveryTerminology ExtractionInformation ExtractionKeyword ExtractionTerm DependenciesFormal Framework
The paper introduces a general Markov random field framework for modeling term dependencies. The framework incorporates arbitrary text features—single terms, ordered and unordered phrases—and explores independence, sequential, and full dependence variants, trained by directly maximizing mean average precision rather than likelihood, and evaluated on newswire and web collections such as GOV2. Modeling dependencies yields significant performance gains, especially on large web collections.
This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on occurrences of single terms, ordered phrases, and unordered phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel approach is developed to train the model that directly maximizes the mean average precision rather than maximizing the likelihood of the training data. Ad hoc retrieval experiments are presented on several newswire and web collections, including the GOV2 collection used at the TREC 2004 Terabyte Track. The results show significant improvements are possible by modeling dependencies, especially on the larger web collections.
| Year | Citations | |
|---|---|---|
Page 1
Page 1