More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

Hossein Zakerinia; Amin Behjati; Christoph H. Lampert

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

Hossein Zakerinia, Amin Behjati, Christoph H. Lampert

TL;DR

The paper extends PAC-Bayesian theory to meta-learning by modeling knowledge transfer as learning learning algorithms, not just priors. It introduces two generalization bounds that apply to a broad set of learning algorithms, including hypernetworks, representations, and optimization-based methods, by employing a meta-posterior over algorithms and hyper-posteriors over priors. The bounds separate task-level generalization from within-task generalization and accommodate algorithm-specific hyper-priors, enabling flexible, environment-adaptive transfers. Empirical studies show the new bounds yield tighter estimates and that decoupling initialization from regularization can improve performance on low-data meta-learning tasks. Overall, the framework promises broader applicability and potential improvements for practical meta-learning across diverse mechanisms.

Abstract

We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory. Its main advantage over previous work is that it allows for more flexibility in how the transfer of knowledge between tasks is realized. For previous approaches, this could only happen indirectly, by means of learning prior distributions over models. In contrast, the new generalization bounds that we prove express the process of meta-learning much more directly as learning the learning algorithm that should be used for future tasks. The flexibility of our framework makes it suitable to analyze a wide range of meta-learning mechanisms and even design new mechanisms. Other than our theoretical contributions we also show empirically that our framework improves the prediction quality in practical meta-learning mechanisms.

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

TL;DR

Abstract

Paper Structure (34 sections, 12 theorems, 61 equations, 1 figure, 3 tables)

This paper contains 34 sections, 12 theorems, 61 equations, 1 figure, 3 tables.

Introduction
Background
PAC-Bayesian Learning
PAC-Bayesian Meta-Learning
Main Results
Discussion
Complexity terms
Hyper-posteriors
Difference between the theorems
Comparison with previous works
Recovering common meta-learning methods
Proof Sketch
Part I
Part II
Proof of Theorem \ref{['theorem:Main_pa']}
...and 19 more sections

Key Result

Theorem 3.1

For any fixed meta-prior $\pi$, fixed hyper-prior mapping $\mathcal{P}$ and any $\delta>0$, with probability at least $1-\delta$ over the sampling of the training tasks, for all distributions $\rho \in\mathcal{M}(\mathcal{A})$ over algorithms, and for all hyper-posterior mappings $\mathcal{Q}:\mathc with

Figures (1)

Figure 1: Numeric values of different meta-learning bounds (empirical loss plus complexity terms) for the binary classification task described in Section \ref{['sec:numerical_comparison']}. Values below $1$ are called non-vacuous.

Theorems & Definitions (20)

Theorem 3.1
Theorem 3.2
Lemma 4.1
Lemma 4.2
proof
Lemma 4.3
Theorem 1.1
Lemma 1.2
proof
Lemma 1.3
...and 10 more

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

TL;DR

Abstract

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (20)