On the ERM Principle in Meta-Learning

Yannay Alon; Steve Hanneke; Shay Moran; Uri Shalit

On the ERM Principle in Meta-Learning

Yannay Alon, Steve Hanneke, Shay Moran, Uri Shalit

TL;DR

This work develops a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain and identifies how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity.

Abstract

Classic supervised learning involves algorithms trained on $n$ labeled examples to produce a hypothesis $h \in \mathcal{H}$ aimed at performing well on unseen examples. Meta-learning extends this by training across $n$ tasks, with $m$ examples per task, producing a hypothesis class $\mathcal{H}$ within some meta-class $\mathbb{H}$. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of $n$ (number of tasks) and $m$ (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either $m$ or $n$ tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either $m$ must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as $n$ goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of $\varepsilon$ and identify for each $\varepsilon$ how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

On the ERM Principle in Meta-Learning

TL;DR

This work develops a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain and identifies how many examples per task are needed to achieve an error of

in the limit as the number of tasks

goes to infinity.

Abstract

Classic supervised learning involves algorithms trained on

labeled examples to produce a hypothesis

aimed at performing well on unseen examples. Meta-learning extends this by training across

tasks, with

examples per task, producing a hypothesis class

within some meta-class

. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of

(number of tasks) and

(number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either

tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either

must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as

goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of

and identify for each

how many examples per task are needed to achieve an error of

in the limit as the number of tasks

goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

On the ERM Principle in Meta-Learning

TL;DR

Abstract

On the ERM Principle in Meta-Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (30)