Table of Contents
Fetching ...

Data-Efficient and Robust Task Selection for Meta-Learning

Donglin Zhan, James Anderson

TL;DR

The Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms, and which outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.

Abstract

Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different training stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms. DERTS selects weighted subsets of tasks from task pools by minimizing the approximation error of the full gradient of task pools in the meta-training stage. The selected tasks are efficient for rapid training and robust towards noisy label scenarios. Unlike existing algorithms, DERTS does not require any architecture modification for training and can handle noisy label data in both the support and query sets. Analysis of DERTS shows that the algorithm follows similar training dynamics as learning on the full task pools. Experiments show that DERTS outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.

Data-Efficient and Robust Task Selection for Meta-Learning

TL;DR

The Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms, and which outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.

Abstract

Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different training stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms. DERTS selects weighted subsets of tasks from task pools by minimizing the approximation error of the full gradient of task pools in the meta-training stage. The selected tasks are efficient for rapid training and robust towards noisy label scenarios. Unlike existing algorithms, DERTS does not require any architecture modification for training and can handle noisy label data in both the support and query sets. Analysis of DERTS shows that the algorithm follows similar training dynamics as learning on the full task pools. Experiments show that DERTS outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.
Paper Structure (31 sections, 2 theorems, 32 equations, 3 figures, 8 tables, 2 algorithms)

This paper contains 31 sections, 2 theorems, 32 equations, 3 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

Assume that the loss function $\mathcal{L}(f,\mathcal{D})$ satisfies assumptions a1 --a3 and $\epsilon$ is an upper bound for the RHS of Eq.approx. Then, with the proper constant learning rate $\eta$ and $\eta'$ for outer and inner loop updates and a initialization point $\theta^0$, applying DERTS h where and

Figures (3)

  • Figure 1: DERTS requires task pools to store episodic tasks sampled from task distributions. With the efficient gradient estimation in sec.\ref{['sub2']}, the gradients of all the tasks stored in the task pool are computed. According to the approximation formulated in sec. \ref{['sub1']} and optimization objective in sec. \ref{['sub2']}, a subset of tasks with corresponding weights is constructed to approximate the task pool gradient. The meta-model then conducts a training process on the subsets instead of task pools.
  • Figure 2: Loss Residual and Accuracy for Noisy Task Settings (Early Stage). (a) Test Accuracy of $25\%$ Noise Setting on Mini-ImageNet of ANIL. (b) Training Loss of $25\%$ Noise Setting on Mini-ImageNet of ANIL. (c) Test Accuracy of $40\%$ Noise Setting on Mini-ImageNet of PN. (d) Training Loss of $40\%$ Noise Setting on Mini-ImageNet of PN.
  • Figure 3: Typical examples of selected tasks by DERTS and unselected tasks.

Theorems & Definitions (3)

  • Definition 1: Submodularity
  • Theorem 1: Training Dynamics
  • Proposition 1: Gradient Norm Upper Bound