Discriminative language modeling with conditional random fields and the perceptron algorithm

TLDR

The paper proposes discriminative language modeling for large‑vocabulary speech recognition. It contrasts perceptron and CRF parameter‑estimation methods, encoding models as deterministic weighted finite‑state automata that are intersected with baseline recognizer word lattices. Perceptron training automatically selects a compact feature set in just a few passes, and when used to initialize CRF training yields an extra 0.5 % WER reduction, for a total 1.8 % absolute drop from the 39.2 % baseline.

Abstract

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5% reduction in word error rate, for a total 1.8% absolute reduction from the baseline of 39.2%.

References

Page 1

	Year	Citations

Page 1