Table of Contents
Fetching ...

Meta-learning for Few-shot Natural Language Processing: A Survey

Wenpeng Yin

TL;DR

This survey reviews meta-learning for few-shot NLP, detailing how learning-to-learn across many annotated tasks enables rapid adaptation to new NLP tasks with only a small labeled support set. It contrasts metric-based approaches (e.g., Siamese, Matching Networks, Prototypical Networks, Relation Networks) with optimization-based methods (e.g., MAML, FOMAML, Reptile), summarizing their mechanisms and NLP adaptations. It presents NLP progress along two axes: within a single problem across domains and across diverse problems to improve generalization to unseen tasks. It highlights datasets such as FewRel, SNIPS, CLINC150, and ARSC to illustrate evaluation, and argues for more realistic cross-distribution benchmarks to advance the field.

Abstract

Few-shot natural language processing (NLP) refers to NLP tasks that are accompanied with merely a handful of labeled examples. This is a real-world challenge that an AI system must learn to handle. Usually we rely on collecting more auxiliary information or developing a more efficient learning algorithm. However, the general gradient-based optimization in high capacity models, if training from scratch, requires many parameter-updating steps over a large number of labeled examples to perform well (Snell et al., 2017). If the target task itself cannot provide more information, how about collecting more tasks equipped with rich annotations to help the model learning? The goal of meta-learning is to train a model on a variety of tasks with rich annotations, such that it can solve a new task using only a few labeled samples. The key idea is to train the model's initial parameters such that the model has maximal performance on a new task after the parameters have been updated through zero or a couple of gradient steps. There are already some surveys for meta-learning, such as (Vilalta and Drissi, 2002; Vanschoren, 2018; Hospedales et al., 2020). Nevertheless, this paper focuses on NLP domain, especially few-shot applications. We try to provide clearer definitions, progress summary and some common datasets of applying meta-learning to few-shot NLP.

Meta-learning for Few-shot Natural Language Processing: A Survey

TL;DR

This survey reviews meta-learning for few-shot NLP, detailing how learning-to-learn across many annotated tasks enables rapid adaptation to new NLP tasks with only a small labeled support set. It contrasts metric-based approaches (e.g., Siamese, Matching Networks, Prototypical Networks, Relation Networks) with optimization-based methods (e.g., MAML, FOMAML, Reptile), summarizing their mechanisms and NLP adaptations. It presents NLP progress along two axes: within a single problem across domains and across diverse problems to improve generalization to unseen tasks. It highlights datasets such as FewRel, SNIPS, CLINC150, and ARSC to illustrate evaluation, and argues for more realistic cross-distribution benchmarks to advance the field.

Abstract

Few-shot natural language processing (NLP) refers to NLP tasks that are accompanied with merely a handful of labeled examples. This is a real-world challenge that an AI system must learn to handle. Usually we rely on collecting more auxiliary information or developing a more efficient learning algorithm. However, the general gradient-based optimization in high capacity models, if training from scratch, requires many parameter-updating steps over a large number of labeled examples to perform well (Snell et al., 2017). If the target task itself cannot provide more information, how about collecting more tasks equipped with rich annotations to help the model learning? The goal of meta-learning is to train a model on a variety of tasks with rich annotations, such that it can solve a new task using only a few labeled samples. The key idea is to train the model's initial parameters such that the model has maximal performance on a new task after the parameters have been updated through zero or a couple of gradient steps. There are already some surveys for meta-learning, such as (Vilalta and Drissi, 2002; Vanschoren, 2018; Hospedales et al., 2020). Nevertheless, this paper focuses on NLP domain, especially few-shot applications. We try to provide clearer definitions, progress summary and some common datasets of applying meta-learning to few-shot NLP.

Paper Structure

This paper contains 26 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Multitask learning vs. meta learning. This figure is adapted from DBLPDouYA19.
  • Figure 2: MAML meta-learning
  • Figure 3: Reptile meta-learning (batched version)