Table of Contents
Fetching ...

Bootstrapped Meta-Learning

Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

TL;DR

The paper introduces Bootstrapped Meta-Gradients (BMG), a meta-learning framework that combats short-horizon myopia and curvature issues by bootstrapping a future target from the meta-learner itself and minimizing a distance to that target. A Target Bootstrap (TB) unrolls past the immediate K updates to produce a bootstrapped target without backpropagating through it, while a matching function μ (e.g., KL divergence) regulates the update landscape. The authors prove local performance improvements and demonstrate substantial empirical gains across Atari, non-stationary grid-worlds, and multi-task few-shot learning, including improved exploration in Q-learning and improved data and compute efficiency in MiniImagenet MAML-style setups. Overall, BMG provides a principled way to extend the effective meta-learning horizon and stabilize meta-optimisation, with practical benefits in both reinforcement learning and few-shot transfer settings.

Abstract

Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem. We propose an algorithm that tackles this problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the metric can control meta-optimisation. Meanwhile, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning. Finally, we explore how bootstrapping opens up new possibilities and find that it can meta-learn efficient exploration in an epsilon-greedy Q-learning agent, without backpropagating through the update rule.

Bootstrapped Meta-Learning

TL;DR

The paper introduces Bootstrapped Meta-Gradients (BMG), a meta-learning framework that combats short-horizon myopia and curvature issues by bootstrapping a future target from the meta-learner itself and minimizing a distance to that target. A Target Bootstrap (TB) unrolls past the immediate K updates to produce a bootstrapped target without backpropagating through it, while a matching function μ (e.g., KL divergence) regulates the update landscape. The authors prove local performance improvements and demonstrate substantial empirical gains across Atari, non-stationary grid-worlds, and multi-task few-shot learning, including improved exploration in Q-learning and improved data and compute efficiency in MiniImagenet MAML-style setups. Overall, BMG provides a principled way to extend the effective meta-learning horizon and stabilize meta-optimisation, with practical benefits in both reinforcement learning and few-shot transfer settings.

Abstract

Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem. We propose an algorithm that tackles this problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the metric can control meta-optimisation. Meanwhile, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning. Finally, we explore how bootstrapping opens up new possibilities and find that it can meta-learn efficient exploration in an epsilon-greedy Q-learning agent, without backpropagating through the update rule.

Paper Structure

This paper contains 47 sections, 6 theorems, 29 equations, 19 figures, 9 tables, 7 algorithms.

Key Result

Lemma 1

Let $\mathop{\mathrm{\mathbf{w}}}\nolimits^\prime$ be given by eq:mg. For $\beta$ sufficiently small, $f(\mathop{\mathrm{\mathbf{x}}}\nolimits^{(K)}(\mathop{\mathrm{\mathbf{w}}}\nolimits^\prime)) - f(\mathop{\mathrm{\mathbf{x}}}\nolimits^{(K)}(\mathop{\mathrm{\mathbf{w}}}\nolimits) ) = -\beta {\| \n

Figures (19)

  • Figure 1: Bootstrapped Meta-Gradients.
  • Figure 2: Non-stationary grid-world (\ref{['sec:empirics:twocolor']}). Left: Comparison of total returns under an actor-critic agent over 50 seeds. Right: Learned entropy-regularization schedules. The figure depicts the average regularization weight ($\epsilon$) over 4 task-cycles at 6M steps in the environment.
  • Figure 3: BMG $\varepsilon$-greedy exploration under a $Q(\lambda)$-agent.
  • Figure 4: Human-normalized score across the 57 games in Atari ALE. Left: per-game difference in score between BMG and our implementation of STACX$^*$ at 200M frames. Right: Median scores over learning compared to published baselines. Shading depict standard deviation across 3 seeds.
  • Figure 5: Ablations on Atari. Left: human normalized score decomposition of TB w.r.t. optimizer (SGD, RMS), matching function (L2, KL, KL & V), and bootstrap steps ($L$). BMG with (SGD, L2, $L\!=\!1$) is equivalent to STACX. Center: episode return on Ms Pacman for different $L$. Right: distribution of episode returns over all 57 games, normalized per-game by mean and standard deviation. All results are reported between 190-200M frames over 3 independent seeds.
  • ...and 14 more figures

Theorems & Definitions (9)

  • Lemma 1: MG Descent
  • Theorem 1: BMG Descent
  • Corollary 1
  • Lemma 1: MG Descent
  • proof
  • Theorem 1: BMG Descent
  • proof
  • Corollary 1
  • proof