Table of Contents
Fetching ...

Universal Algorithm-Implicit Learning

Stefano Woerner, Seong Joon Oh, Christian F. Baumgartner

TL;DR

A theoretical framework for meta-learning is introduced which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods.

Abstract

Current meta-learning methods are constrained to narrow task distributions with fixed feature and label spaces, limiting applicability. Moreover, the current meta-learning literature uses key terms like "universal" and "general-purpose" inconsistently and lacks precise definitions, hindering comparability. We introduce a theoretical framework for meta-learning which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods. Guided by this framework, we present TAIL, a transformer-based algorithm-implicit meta-learner that functions across tasks with varying domains, modalities, and label configurations. TAIL features three innovations over prior transformer-based meta-learners: random projections for cross-modal feature encoding, random injection label embeddings that extrapolate to larger label spaces, and efficient inline query processing. TAIL achieves state-of-the-art performance on standard few-shot benchmarks while generalizing to unseen domains. Unlike other meta-learning methods, it also generalizes to unseen modalities, solving text classification tasks despite training exclusively on images, handles tasks with up to 20$\times$ more classes than seen during training, and provides orders-of-magnitude computational savings over prior transformer-based approaches.

Universal Algorithm-Implicit Learning

TL;DR

A theoretical framework for meta-learning is introduced which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods.

Abstract

Current meta-learning methods are constrained to narrow task distributions with fixed feature and label spaces, limiting applicability. Moreover, the current meta-learning literature uses key terms like "universal" and "general-purpose" inconsistently and lacks precise definitions, hindering comparability. We introduce a theoretical framework for meta-learning which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods. Guided by this framework, we present TAIL, a transformer-based algorithm-implicit meta-learner that functions across tasks with varying domains, modalities, and label configurations. TAIL features three innovations over prior transformer-based meta-learners: random projections for cross-modal feature encoding, random injection label embeddings that extrapolate to larger label spaces, and efficient inline query processing. TAIL achieves state-of-the-art performance on standard few-shot benchmarks while generalizing to unseen domains. Unlike other meta-learning methods, it also generalizes to unseen modalities, solving text classification tasks despite training exclusively on images, handles tasks with up to 20 more classes than seen during training, and provides orders-of-magnitude computational savings over prior transformer-based approaches.
Paper Structure (34 sections, 7 theorems, 14 equations, 4 figures, 8 tables)

This paper contains 34 sections, 7 theorems, 14 equations, 4 figures, 8 tables.

Key Result

Theorem 1.1

Let $\mathcal{X}$ be a feature space, $\mathcal{Y}$ a label space and $S = \{(x_i, y_i)\}_{i=1}^{n}$ a support dataset and $(x,y)$ a query sample. For any permutation $\sigma$ of $\mathcal{Y}$, let Then i.e. $g_\theta(S, x)$ is equivariant in distribution to the reindexing of $\mathcal{Y}$.

Figures (4)

  • Figure 1: Method overview. The input is encoded with a modality-appropriate pretrained encoder and then projected to a common modality-agnostic space. The labels are embedded using a randomized injection to a learnable embedding dictionary. The input and label embeddings are concatenated and form the input tokens for a transformer encoder. A linear classification head makes a prediction in label embedding space, which is then remapped to the original set of labels.
  • Figure 2: (a): performance degradation with increasing number of classes (1-shot setting). (b) and (c): wall clock time for 1000 test episodes as a function of task size. Two different scales show the relation to the algorithm-explicit baselines and to the meta-learning baselines. (d): memory usage during training as a function of task size, (e): wall clock time for 1000 training episodes.
  • Figure 3: Performance degradation with increasing number of classes (5-shot setting).
  • Figure 4: Validation loss curves for scheduled addition of more embeddings to the embedding dictionary.

Theorems & Definitions (20)

  • Definition 2.1: Learning Algorithm
  • Definition 3.1: Demonstration-Conditioned Inference
  • Definition 3.2: Universal Consistency
  • Definition 3.3: Learning Curve
  • Definition 3.4: Valid Learning Algorithm
  • Definition 3.5: Universal Validity
  • Theorem 1.1: Equivariance to label re-indexing
  • proof : Proof of Theorem \ref{['thm:equivariance-reindexing']}
  • Proposition 1.2: Unbiased gradients
  • Proposition 1.3: Coverage over $t$ episodes
  • ...and 10 more