Representation Degeneration Problem in Training Natural Language\n Generation Models

Abstract

We study an interesting problem in training neural network-based models for\nnatural language generation tasks, which we call the \\emph{representation\ndegeneration problem}. We observe that when training a model for natural\nlanguage generation tasks through likelihood maximization with the weight tying\ntrick, especially with big training datasets, most of the learnt word\nembeddings tend to degenerate and be distributed into a narrow cone, which\nlargely limits the representation power of word embeddings. We analyze the\nconditions and causes of this problem and propose a novel regularization method\nto address it. Experiments on language modeling and machine translation show\nthat our method can largely mitigate the representation degeneration problem\nand achieve better performance than baseline algorithms.\n

References

Page 1

	Year	Citations

Page 1