Publication | Open Access
On the Dangers of Stochastic Parrots
4.7K
Citations
78
References
2021
Year
Unknown Venue
EngineeringMultilingual PretrainingLarge Language ModelCorpus LinguisticsText MiningNatural Language ProcessingLarge Language ModelsSyntaxLanguage DocumentationData ScienceComputational LinguisticsSheer SizeLanguage EngineeringAvian EvolutionSpecific BenchmarksStochastic ParrotsLanguage StudiesLanguage ModelsMachine TranslationLarger Language ModelsAvian LocomotionNlp TaskLanguage TechnologyPre-trained ModelsEvolutionary BiologyAnimal BehaviorLinguistics
Recent NLP progress has been driven by ever larger language models such as BERT, GPT‑2/3, and Switch‑C, whose architectural innovations and sheer scale have expanded the state of the art across many English benchmark tasks through fine‑tuning. This work questions whether model size can become excessive, examining the risks of overly large language models and exploring mitigation strategies. The authors recommend prioritizing environmental and financial cost assessments, curating and documenting datasets instead of indiscriminate web ingestion, conducting pre‑development evaluations of alignment with research goals and stakeholder values, and pursuing research directions beyond continually scaling models.
The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1