Table of Contents
Fetching ...

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling

Wonho Bae, Jing Wang, Danica J. Sutherland

TL;DR

This work investigates actively selecting labels for the context set in meta-learning. It demonstrates that active context selection at meta-training yields little to no gains, while deployment-time selection can significantly improve performance. The authors introduce a Gaussian Mixture Model–based selection method that leverages meta-learning representations, with a theoretical motivation showing Bayes-optimality under stylized conditions. Empirically, GMM-based active selection outperforms uncertainty-based and other low-budget strategies across few-shot classification, cross-domain tasks, and regression, across multiple meta-learning algorithms. The results offer a simple, robust, and broadly applicable approach to reducing labeling costs in real-world meta-learning systems.

Abstract

Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learning can be used to label a context set, depending on which parts of the meta-learning process use active learning. Within this framework, we propose a natural algorithm based on fitting Gaussian mixtures for selecting which points to label; though simple, the algorithm also has theoretical motivation. The proposed algorithm outperforms state-of-the-art active learning methods when used with various meta-learning algorithms across several benchmark datasets.

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling

TL;DR

This work investigates actively selecting labels for the context set in meta-learning. It demonstrates that active context selection at meta-training yields little to no gains, while deployment-time selection can significantly improve performance. The authors introduce a Gaussian Mixture Model–based selection method that leverages meta-learning representations, with a theoretical motivation showing Bayes-optimality under stylized conditions. Empirically, GMM-based active selection outperforms uncertainty-based and other low-budget strategies across few-shot classification, cross-domain tasks, and regression, across multiple meta-learning algorithms. The results offer a simple, robust, and broadly applicable approach to reducing labeling costs in real-world meta-learning systems.

Abstract

Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learning can be used to label a context set, depending on which parts of the meta-learning process use active learning. Within this framework, we propose a natural algorithm based on fitting Gaussian mixtures for selecting which points to label; though simple, the algorithm also has theoretical motivation. The proposed algorithm outperforms state-of-the-art active learning methods when used with various meta-learning algorithms across several benchmark datasets.
Paper Structure (39 sections, 5 theorems, 22 equations, 9 figures, 14 tables, 1 algorithm)

This paper contains 39 sections, 5 theorems, 22 equations, 9 figures, 14 tables, 1 algorithm.

Key Result

proposition thmcounterproposition

Suppose that $\{ x_i \}_{i=1}^N$ are orthonormal. Then, the solution to eq:multiclass-svm with the dataset $\{ (x_y, y) \}_{y=1}^N$ is given by $w_y = x_y - \frac{1}{N} \sum_{i=1}^N x_i$, and hence

Figures (9)

  • Figure 1: Meta-training process. $\mathop{\mathrm{Pick}}\nolimits_\theta$ can be stratified or unstratified, active or passive.
  • Figure 2: Decision boundaries using a multi-class SVM \ref{['eq:multiclass-svm']} trained on a one-shot dataset containing (a) cluster centers (stars) and (b) randomly selected points (circles).
  • Figure 3: Left. t-SNE of unlabeled points of one 5-way, 1-shot, unstratified MiniImageNet task. Stars denote selected context points using each method. Right. Distributions of the number of classes selected in each $\widetilde{\mathcal{C}}$ by ProtoNet on MiniImageNet among 600 meta-test cases, along with the mean empirical entropy of $y$ from $\widetilde{\mathcal{C}}$. The higher the value is, the more diverse classes are selected; $\log 5 \approx 1.6$ would be perfect.
  • Figure 4: Decision boundaries using a multiclass SVM \ref{['eq:multiclass-svm']} trained on cluster centers (shown by stars), with (a) the one-shot case and (b) the three-shot case.
  • Figure 5: Estimation of goodness of selected data points on MiniImageNet with ANIL using the distribution of (a) the distance between the unlabeled points and closest selected points, and (b) the equality between the true labels of unlabeled points and labels of the closest select points. Red dotted lines show mean values.
  • ...and 4 more figures

Theorems & Definitions (8)

  • proposition thmcounterproposition
  • corollary thmcountercorollary
  • corollary thmcountercorollary
  • proof
  • proposition thmcounterproposition
  • proof
  • lemma thmcounterlemma
  • proof