Pre-Trained Language Models and Their Applications

TLDR

Pre‑trained language models have revolutionized NLP by shifting from supervised learning to a pre‑training plus fine‑tuning paradigm, sparking extensive research into improving these models. This review surveys key advances in pre‑trained models, introduces a taxonomy of such models, and outlines future research directions. The authors describe the architecture of pre‑trained models, their characteristic methods and frameworks, and analyze their impact, challenges, and downstream applications.

Abstract

Pre-trained language models have achieved striking success in natural language processing (NLP), leading to a paradigm shift from supervised learning to pre-training followed by fine-tuning. The NLP community has witnessed a surge of research interest in improving pre-trained models. This article presents a comprehensive review of representative work and recent progress in the NLP field and introduces the taxonomy of pre-trained models. We first give a brief introduction of pre-trained models, followed by characteristic methods and frameworks. We then introduce and analyze the impact and challenges of pre-trained models and their downstream applications. Finally, we briefly conclude and address future research directions in this field.

References

Page 1

	Year	Citations
Long Short-Term Memory Sepp Hochreiter, Jürgen Schmidhuber Neural Computation	1997	93.8K
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
Glove: Global Vectors for Word Representation Jeffrey Pennington, Richard Socher, Christopher D. Manning EngineeringMachine LearningVector SpaceCorpus LinguisticsText Mining	2014	33.2K
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Convolutional Neural NetworkEngineeringMachine LearningFeature DetectionRich Feature Hierarchies	2014	31.2K
HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case Yinhan Liu, Myle Ott, Naman Goyal, DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)	2019	17.1K
Convolutional Neural Networks for Sentence Classification Yoon Kim Natural Language ProcessingLlm Fine-tuningNatural LanguageEngineeringCorpus Linguistics	2014	13.5K
Momentum Contrast for Unsupervised Visual Representation Learning Kaiming He, Haoqi Fan, Yuxin Wu, Convolutional Neural NetworkImage AnalysisMachine LearningData ScienceMachine Vision	2020	11.6K
Exploring the Limits of Transfer Learning with a Unified Text-to-Text\n Transformer Colin Raffel, Noam Shazeer, Adam Roberts, arXiv (Cornell University)	2019	8.3K
Learning Transferable Visual Models From Natural Language Supervision Alec Radford, Jong Wook Kim, Chris Hallacy, arXiv (Cornell University) Few-shot LearningEngineeringMachine LearningNatural Language ProcessingMultimodal Llm	2021	5.3K
“Why Should I Trust You?”: Explaining the Predictions of Any Classifier Marco Ribeiro, Sameer Singh, Carlos Guestrin	2016	4.8K

Page 1