Finetuned Language Models Are Zero-Shot Learners

TLDR

The paper investigates a simple method to enhance language models’ zero‑shot learning. The authors instruction‑tune a 137‑billion‑parameter pretrained model on over 60 NLP tasks using natural‑language instruction templates and evaluate the resulting FLAN model on unseen task types. Instruction tuning substantially boosts zero‑shot performance, with FLAN outperforming zero‑shot 175‑billion‑parameter GPT‑3 on 20 of 25 tasks, surpassing few‑shot GPT‑3 on several benchmarks, and ablation studies confirm that dataset count, model scale, and natural‑language instructions drive the gains.

Abstract

This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning.

References

Page 1

	Year	Citations

Page 1