Sequence modeling with mixtures of conditional maximum entropy distributions

Abstract

We present a novel approach to modeling sequences using mixtures of conditional maximum entropy (maxent) distributions. Our method generalizes the mixture of first-order Markov models by including the "long-term" dependencies in model components. The "long-term" dependencies are represented by the frequently used in the natural language processing (NLP) domain probabilistic triggers or rules (such as "A occurred k positions back"/spl rarr/"the current symbol is B" with probability P). The maxent framework is then used to create a coherent global probabilistic model from all selected triggers. We enhance this formalism by using probabilistic mixtures with maxent models as components, thus representing hidden or unobserved effects in the data. We demonstrate how our mixture of conditional maxent models can be learned from data using the generalized EM algorithm that scales linearly in the dimensions of the data and the number of mixture components. We present empirical results on the simulated and real-world data sets and demonstrate that the proposed approach enables us to create better quality models than the mixtures of first-order Markov models and resist overfitting and curse of dimensionality that would inevitably present themselves for the higher order Markov models.

References

Page 1

	Year	Citations

Page 1