The Curious Case of Neural Text Degeneration

TLDR

Despite advances in neural language modeling, the optimal decoding strategy for text generation remains unclear. The authors propose Nucleus Sampling to generate higher‑quality text from neural language models. They compare beam search, other stochastic decoding methods, and Nucleus Sampling by evaluating generated text against human text on likelihood, diversity, and repetition, while Nucleus Sampling truncates the unreliable tail of the probability distribution to sample from the dynamic nucleus. The study finds that maximization‑based decoding causes degeneration, language models have unreliable tails that need truncation, and Nucleus Sampling produces high‑quality, diverse long‑form text comparable to human writing.

Abstract

Despite considerable advances in neural language modeling, it remains an open question what the best decoding strategy is for text generation from a language model (e.g. to generate a story). The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, maximization-based decoding methods such as beam search lead to degeneration — output text that is bland, incoherent, or gets stuck in repetitive loops. To address this we propose Nucleus Sampling, a simple but effective method to draw considerably higher quality text out of neural language models. Our approach avoids text degeneration by truncating the unreliable tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast majority of the probability mass. To properly examine current maximization-based and stochastic decoding methods, we compare generations from each of these methods to the distribution of human text along several axes such as likelihood, diversity, and repetition. Our results show that (1) maximization is an inappropriate decoding objective for open-ended text generation, (2) the probability distributions of the best current language models have an unreliable tail which needs to be truncated during generation and (3) Nucleus Sampling is the best decoding strategy for generating long-form text that is both high-quality — as measured by human evaluation — and as diverse as human-written text.

References

Page 1

	Year	Citations

Page 1