Table of Contents
Fetching ...

On the ERM Principle in Meta-Learning

Yannay Alon, Steve Hanneke, Shay Moran, Uri Shalit

TL;DR

This work develops a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain and identifies how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity.

Abstract

Classic supervised learning involves algorithms trained on $n$ labeled examples to produce a hypothesis $h \in \mathcal{H}$ aimed at performing well on unseen examples. Meta-learning extends this by training across $n$ tasks, with $m$ examples per task, producing a hypothesis class $\mathcal{H}$ within some meta-class $\mathbb{H}$. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of $n$ (number of tasks) and $m$ (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either $m$ or $n$ tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either $m$ must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as $n$ goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of $\varepsilon$ and identify for each $\varepsilon$ how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

On the ERM Principle in Meta-Learning

TL;DR

This work develops a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain and identifies how many examples per task are needed to achieve an error of in the limit as the number of tasks goes to infinity.

Abstract

Classic supervised learning involves algorithms trained on labeled examples to produce a hypothesis aimed at performing well on unseen examples. Meta-learning extends this by training across tasks, with examples per task, producing a hypothesis class within some meta-class . This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of (number of tasks) and (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either or tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of and identify for each how many examples per task are needed to achieve an error of in the limit as the number of tasks goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

Paper Structure

This paper contains 19 sections, 8 theorems, 50 equations.

Key Result

Theorem 1

Let $\mathbb{H}$ be a finite VC meta-hypothesis family. Then, Where $O_\mathbb{H}$ hides constants that may depend on $\mathbb{H}$ only.

Theorems & Definitions (30)

  • Definition 2.1: VC family
  • Definition 2.2: ERM learning surface
  • Theorem 1: ERM learning surface upper bound
  • proof : Proof idea
  • Definition 2.3: Learning surface projection
  • Corollary 1
  • Definition 2.4: Informal definition of non-trivial meta-hypothesis family
  • Definition 2.5: $\varepsilon$ dual Helly number
  • Theorem 2: Learning surface's projections lower bound
  • Definition 4.1: Weak non-separability
  • ...and 20 more