Table of Contents
Fetching ...

Set-based Meta-Interpolation for Few-Task Meta-Learning

Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

TL;DR

A novel domain-agnostic task augmentation method, Meta-Interpolation, which utilizes expressive neural set functions to densify the meta-training task distribution using bilevel optimization and it is proved that task interpolation with the set function regularizes theMeta-learner to improve generalization.

Abstract

Meta-learning approaches enable machine learning systems to adapt to new tasks given few examples by leveraging knowledge from related tasks. However, a large number of meta-training tasks are still required for generalization to unseen tasks during meta-testing, which introduces a critical bottleneck for real-world problems that come with only few tasks, due to various reasons including the difficulty and cost of constructing tasks. Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution. However, such reliance on domain-specific knowledge renders these methods inapplicable to other domains. While Manifold Mixup based task augmentation methods are domain-agnostic, we empirically find them ineffective on non-image domains. To tackle these limitations, we propose a novel domain-agnostic task augmentation method, Meta-Interpolation, which utilizes expressive neural set functions to densify the meta-training task distribution using bilevel optimization. We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains such as image classification, molecule property prediction, text classification and speech recognition. Experimentally, we show that Meta-Interpolation consistently outperforms all the relevant baselines. Theoretically, we prove that task interpolation with the set function regularizes the meta-learner to improve generalization.

Set-based Meta-Interpolation for Few-Task Meta-Learning

TL;DR

A novel domain-agnostic task augmentation method, Meta-Interpolation, which utilizes expressive neural set functions to densify the meta-training task distribution using bilevel optimization and it is proved that task interpolation with the set function regularizes theMeta-learner to improve generalization.

Abstract

Meta-learning approaches enable machine learning systems to adapt to new tasks given few examples by leveraging knowledge from related tasks. However, a large number of meta-training tasks are still required for generalization to unseen tasks during meta-testing, which introduces a critical bottleneck for real-world problems that come with only few tasks, due to various reasons including the difficulty and cost of constructing tasks. Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution. However, such reliance on domain-specific knowledge renders these methods inapplicable to other domains. While Manifold Mixup based task augmentation methods are domain-agnostic, we empirically find them ineffective on non-image domains. To tackle these limitations, we propose a novel domain-agnostic task augmentation method, Meta-Interpolation, which utilizes expressive neural set functions to densify the meta-training task distribution using bilevel optimization. We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains such as image classification, molecule property prediction, text classification and speech recognition. Experimentally, we show that Meta-Interpolation consistently outperforms all the relevant baselines. Theoretically, we prove that task interpolation with the set function regularizes the meta-learner to improve generalization.
Paper Structure (42 sections, 3 theorems, 88 equations, 9 figures, 18 tables, 2 algorithms)

This paper contains 42 sections, 3 theorems, 88 equations, 9 figures, 18 tables, 2 algorithms.

Key Result

Theorem 1

For any $J\in \mathbb{N}_+$, if $c \mapsto d(y,c)$ is $J$-times differentiable for all $y$, then the $J$-th order approximation of $\mathcal{L}_{\text{mix}}(\lambda, \theta, \hat{\mathcal{T}} _{t,t'})$ is given by $\mathcal{L}_{\text{singleton}}\left(\lambda, \theta; \mathcal{T}_{t} \right)+ \sum_{

Figures (9)

  • Figure 1: Concept. Three-way one-shot classification problem. (a) A new class is assigned to a pair of classes sampled without replacement from the pool of meta-training tasks. (b) The support sets are interpolated with a set function and paired with a query set. (c) Bilevel optimization of the set function and meta-learner.
  • Figure 2: (a)$\sim$(d) Meta-train and meta-validation loss on RMNIST and NCI for ProtoNet, MLTI, MetaMix, ProtoNet+ST, and Meta Interpolation.
  • Figure 3: Visualization of original and interpolated tasks from NCI ((a) and (b)) and ESC-50 ((c) and (d)).
  • Figure 3: Ablation study on ESC-50 dataset.
  • Figure 4: Acc. as a function of set size.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Proposition 1
  • Proposition 2