Meta-Learning Neural Procedural Biases

Christian Raymond; Qi Chen; Bing Xue; Mengjie Zhang

Meta-Learning Neural Procedural Biases

Christian Raymond, Qi Chen, Bing Xue, Mengjie Zhang

TL;DR

The paper addresses the challenge of rapid adaptation in few-shot learning by introducing Neural Procedural Bias Meta-Learning (NPBML), which meta-learns task-adaptive procedural biases across initialization, optimization, and loss functions. It replaces the fixed inner update of MAML with a fully meta-learned rule that warps gradients via a preconditioning matrix $P_{ω}$ and uses a meta-learned loss $M_{φ}$, both modulated per task by FiLM $FiLM_{ψ}$. The approach jointly optimizes initialization $θ$, optimizer $ω$, loss $φ$, and FiLM parameters $ψ$, enabling per-task inductive biases and robust fast adaptation with few gradient steps; experiments on Mini-, Tiered-, CIFAR-FS, and FC-100 show consistent improvements over state-of-the-art gradient-based meta-learning baselines. Overall, NPBML offers a flexible, end-to-end framework that unifies learned initialization, optimization, and loss design to enhance few-shot learning performance and generalization, with potential extensions to cross-domain and continual learning.

Abstract

The goal of few-shot learning is to generalize and achieve high performance on new unseen learning tasks, where each task has only a limited number of examples available. Gradient-based meta-learning attempts to address this challenging task by learning how to learn new tasks by embedding inductive biases informed by prior learning experiences into the components of the learning algorithm. In this work, we build upon prior research and propose Neural Procedural Bias Meta-Learning (NPBML), a novel framework designed to meta-learn task-adaptive procedural biases. Our approach aims to consolidate recent advancements in meta-learned initializations, optimizers, and loss functions by learning them simultaneously and making them adapt to each individual task to maximize the strength of the learned inductive biases. This imbues each learning task with a unique set of procedural biases which is specifically designed and selected to attain strong learning performance in only a few gradient steps. The experimental results show that by meta-learning the procedural biases of a neural network, we can induce strong inductive biases towards a distribution of learning tasks, enabling robust learning performance across many well-established few-shot learning benchmarks.

Meta-Learning Neural Procedural Biases

TL;DR

and uses a meta-learned loss

, both modulated per task by FiLM

. The approach jointly optimizes initialization

, optimizer

, loss

, and FiLM parameters

, enabling per-task inductive biases and robust fast adaptation with few gradient steps; experiments on Mini-, Tiered-, CIFAR-FS, and FC-100 show consistent improvements over state-of-the-art gradient-based meta-learning baselines. Overall, NPBML offers a flexible, end-to-end framework that unifies learned initialization, optimization, and loss design to enhance few-shot learning performance and generalization, with potential extensions to cross-domain and continual learning.

Abstract

Paper Structure (29 sections, 16 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 29 sections, 16 equations, 4 figures, 4 tables, 2 algorithms.

Introduction
Background
Model Agnostic Meta-Learning
Neural Procedural Bias Meta-Learning
Overview
Meta-Learned Optimizer
Meta-Learned Loss Function
Task-Adaptive Modulation
Initialization
Implicit Meta-Learning
Related Work
Experimental Evaluation
Results and Analysis
Mini-ImageNet and Tiered-ImageNet
CIFAR-FS and FC-100
...and 14 more sections

Figures (4)

Figure 1: In NPBML, the procedural biases of a deep neural network are meta-learned. This involves meta-learning three key components: the loss function (left), the parameter initialization (center), and the optimizer (right). By meta-learning these components, a strong inductive bias towards fast adaptation can be induced into the learning algorithm.
Figure 2: In MAML, the update rule $\mathrm{U}^{MAML}$ optimizes the base model parameters from a shared initialization using simple SGD minimizing $\mathcal{L}^{base}$. In contrast, NPBML adapts the model parameters from a task-adapted initialization using $\mathrm{U}^{NPBML}$, a task-adaptive update rule employing a meta-learned preconditioning matrix $P_{\bm{\omega}}$ and loss function $\mathcal{M}_{\phi}$.
Figure 3: An example of a two-layer convolutional neural network in NPBML, where layers $\theta^{(1)}$ and $\theta^{(2)}$, are interleaved with warp preconditioning layers $\bm{\omega}^{(1)}$ and $\bm{\omega}^{(2)}$. Both types of layers are modulated in the inner loop using feature-wise linear modulation layers to induce task adaptation.
Figure 4: An example of one of the meta-learned loss functions in NPBML, where the loss function represented as a composition of feed-forward (linear) layers are modulated using feature-wise linear modulation, resulting in a task-adaptive meta-learned loss function.

Meta-Learning Neural Procedural Biases

TL;DR

Abstract

Meta-Learning Neural Procedural Biases

Authors

TL;DR

Abstract

Table of Contents

Figures (4)