Large language models are few-shot clinical information extractors

TLDR

The clinical NLP community aims to extract key variables from clinical notes, but faces dataset shift and a lack of public corpora. This study demonstrates that large language models such as InstructGPT can perform zero‑ and few‑shot information extraction from clinical text. The authors leverage these models for span identification, token‑level sequence classification, and relation extraction, and introduce new datasets based on a re‑annotation of CASI for benchmarking. GPT‑3 systems significantly outperform existing zero‑ and few‑shot baselines on the studied clinical extraction tasks.

Abstract

A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT (Ouyang et al., 2022), perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset (Moon et al., 2014) for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.

References

Page 1

	Year	Citations

Page 1