Maximum entropy models for natural language ambiguity resolution

TLDR

Maximum‑entropy models rely on simple, knowledge‑poor features and reusable software, enabling high‑accuracy natural‑language tasks with less supervision than competing methods. The thesis shows that maximum‑entropy modeling can resolve multiple natural‑language ambiguities to state‑of‑the‑art accuracy. The author applies a task‑independent maximum‑entropy framework, implemented in a single software system, to sentence‑boundary detection, POS tagging, prepositional‑phrase attachment, parsing, and text categorization. Experiments show that maximum‑entropy models achieve state‑of‑the‑art or superior accuracies across a wide range of natural‑language tasks with minimal task‑specific effort.

Abstract

This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy. We discuss the problems of sentence boundary detection, part-of-speech tagging, prepositional phrase attachment, natural language parsing, and text categorization under the maximum entropy framework. In practice, we have found that maximum entropy models offer the following advantages: State-of-the-art accuracy. The probability models for all of the tasks discussed perform at or near state-of-the-art accuracies, or outperform competing learning algorithms when trained and tested under similar conditions. Methods which outperform those presented here require much more supervision in the form of additional human involvement or additional supporting resources. Knowledge-poor features. The facts used to model the data, or features, are linguistically very simple, or knowledge-poor, but yet succeed in approximating complex linguistic relationships. Reusable software technology. The mathematics of the maximum entropy framework are essentially independent of any particular task, and a single software implementation can be used for all of the probability models in this thesis. The experiments in this thesis suggest that experimenters can obtain state-of-the-art accuracies on a wide range of natural language tasks, with little task-specific effort, by using maximum entropy probability models.