Few-Shot Named Entity Recognition via Meta-Learning

TLDR

Few-shot learning has been extensively explored in relation extraction and image classification, yet the N‑way K‑shot setting for named entity recognition remains largely unstudied due to the entangled nature of entity classes in sequence labeling. This work formally defines a suitable N‑way K‑shot framework for NER. The proposed FewNER model splits into a task‑independent component meta‑learned across tasks and a task‑specific component trained per task in a low‑dimensional space, which is adapted at test time via gradient descent on the task‑specific part, explicitly optimizing rapid adaptation and reducing overfitting compared to implicit transfer from pre‑trained language models. FewNER outperforms nine baselines by large margins on three adaptation scenarios— intra‑domain cross‑type, cross‑domain intra‑type, and cross‑domain cross‑type—establishing state‑of‑the‑art performance.

Abstract

Few-shot learning under the <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -way <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> -shot setting (i.e., <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> annotated samples for each of <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> classes) has been widely studied in relation extraction (e.g., FewRel) and image classification (e.g., Mini-ImageNet). Named entity recognition (NER) is typically framed as a sequence labeling problem where the entity classes are inherently entangled together because the entity number and classes in a sentence are not known in advance, leaving the <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -way <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> -shot NER problem so far unexplored. In this paper, we first formally define a more suitable <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -way <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> -shot setting for NER. Then we propose FewNER , a novel meta-learning approach for few-shot NER. FewNER separates the entire network into a task-independent part and a task-specific part. During training in FewNER , the task-independent part is meta-learned across multiple tasks and the task-specific part is learned for each individual task in a low-dimensional space. At test time, FewNER keeps the task-independent part fixed and adapts to a new task via gradient descent by updating only the task-specific part, resulting in it being less prone to overfitting and more computationally efficient. Compared with pre-trained language models (e.g., BERT and ELMo) which obtain the transferability in an implicit manner (i.e., relying on large-scale corpora), FewNER explicitly optimizes the capability of "learning to adapt quickly" through meta-learning. The results demonstrate that FewNER achieves state-of-the-art performance against nine baseline methods by significant margins on three adaptation experiments (i.e., intra-domain cross-type, cross-domain intra-type and cross-domain cross-type).

References

Page 1

	Year	Citations

Page 1