A Scalable Hierarchical Distributed Language Model

TLDR

Neural probabilistic language models are competitive with n‑gram models but suffer from very long training and testing times, and prior hierarchical models were faster yet less accurate. This work introduces a fast hierarchical language model together with a simple feature‑based algorithm that automatically constructs word trees from data. The authors design a hierarchical model that leverages automatically built word trees to reduce computational cost while maintaining accuracy. The resulting models outperform both non‑hierarchical neural models and the best n‑gram models.

Abstract

Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPLMs is their extremely long training and testing times. Morin and Bengio have proposed a hierarchical language model built around a binary tree of words, which was two orders of magnitude faster than the non-hierarchical model it was based on. However, it performed considerably worse than its non-hierarchical counterpart in spite of using a word tree created using expert knowledge. We introduce a fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data. We then show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models.

References

Page 1

	Year	Citations

Page 1