Publication | Closed Access
Strategies for training large scale neural network language models
512
Citations
21
References
2011
Year
Unknown Venue
EngineeringMachine LearningNeural NetworkComputational ComplexityMultilingual PretrainingLarge Language ModelText MiningSpeech RecognitionLarge Language ModelsNatural Language ProcessingData ScienceComputational LinguisticsLanguage StudiesLanguage ModelsMachine TranslationLarge Ai ModelSequence ModellingComputer ScienceLinguisticsPo Tagging
The paper proposes methods for efficiently training neural network language models on large datasets. It introduces a hash‑based maximum‑entropy component that can be trained jointly within the neural network. Sorting training data by relevance yields faster convergence, lower computational cost, and a 10 % relative reduction in word‑error rate on English Broadcast News compared to a large 4‑gram baseline.
We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.
| Year | Citations | |
|---|---|---|
Page 1
Page 1