Publication | Open Access
Using the Output Embedding to Improve Language Models
91
Citations
38
References
2017
Year
Unknown Venue
Natural Language ProcessingValid Word EmbeddingLarge Ai ModelSyntaxEngineeringMachine LearningOutput EmbeddingCross-lingual RepresentationComputational LinguisticsUpdate RulesNeural Machine TranslationComputer ScienceLanguage StudiesLarge Language ModelLanguage ModelsLinguisticsMachine TranslationWord Embeddings
The authors investigate the topmost weight matrix of neural language models, propose tying it with the input embedding, and introduce a new regularization method for the output embedding. They propose a regularization technique that constrains the output embedding during training. Their approach yields a valid word embedding, aligns the tied embedding’s updates with the output embedding, reduces perplexity across models, and halves neural translation model size without performance loss.
We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1