Max-Margin Markov Networks

TLDR

Typical classification tasks assign a label to a single object, and kernel-based methods like SVMs are popular for their high-dimensional feature handling and strong theoretical guarantees, but they ignore structure in sequential or spatial data, while probabilistic graphical models capture label correlations yet lack high-dimensional capability and generalization guarantees. This paper introduces Maximum margin Markov (M3) networks, a framework that merges kernel methods with Markov network structure to simultaneously handle high-dimensional features and capture label correlations. An efficient learning algorithm for M3 networks is presented, formulated as a compact quadratic program. The authors derive a new generalization bound for structured domains and demonstrate that M3 networks achieve significant performance gains on handwritten character recognition and collective hypertext classification compared to prior approaches.

Abstract

In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M3) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches.

References

Page 1

	Year	Citations

Page 1