Table of Contents
Fetching ...

Incremental Learning-to-Learn with Statistical Guarantees

Giulia Denevi, Carlo Ciliberto, Dimitris Stamos, Massimiliano Pontil

TL;DR

The paper tackles learning-to-learn in an online, lifelong setting where tasks arrive sequentially from an unknown meta-distribution. It proposes an online meta-learning method that updates a ridge-based linear representation $D$ via projected stochastic subgradient steps, yielding non-asymptotic guarantees on excess transfer risk. A key contribution is the decomposition of risk into uniform generalization and excess future empirical risk, with bounds that match batch LTL up to constants while demanding far less memory and computation. Empirical results on synthetic data and the Schools dataset illustrate favorable generalization and notable speedups relative to batch LTL and ITL, supporting the practicality of the online approach. Overall, the work provides a principled, scalable framework for incremental meta-learning with provable guarantees, paving the way for extensions to broader losses and LTL algorithms.

Abstract

In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta distribution. In contrast to previous work on batch learning-to-learn, we consider a scenario where tasks are presented sequentially and the algorithm needs to adapt incrementally to improve its performance on future tasks. Key to this setting is for the algorithm to rapidly incorporate new observations into the model as they arrive, without keeping them in memory. We focus on the case where the underlying algorithm is ridge regression parameterized by a positive semidefinite matrix. We propose to learn this matrix by applying a stochastic strategy to minimize the empirical error incurred by ridge regression on future tasks sampled from the meta distribution. We study the statistical properties of the proposed algorithm and prove non-asymptotic bounds on its excess transfer risk, that is, the generalization performance on new tasks from the same meta distribution. We compare our online learning-to-learn approach with a state of the art batch method, both theoretically and empirically.

Incremental Learning-to-Learn with Statistical Guarantees

TL;DR

The paper tackles learning-to-learn in an online, lifelong setting where tasks arrive sequentially from an unknown meta-distribution. It proposes an online meta-learning method that updates a ridge-based linear representation via projected stochastic subgradient steps, yielding non-asymptotic guarantees on excess transfer risk. A key contribution is the decomposition of risk into uniform generalization and excess future empirical risk, with bounds that match batch LTL up to constants while demanding far less memory and computation. Empirical results on synthetic data and the Schools dataset illustrate favorable generalization and notable speedups relative to batch LTL and ITL, supporting the practicality of the online approach. Overall, the work provides a principled, scalable framework for incremental meta-learning with provable guarantees, paving the way for extensions to broader losses and LTL algorithms.

Abstract

In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta distribution. In contrast to previous work on batch learning-to-learn, we consider a scenario where tasks are presented sequentially and the algorithm needs to adapt incrementally to improve its performance on future tasks. Key to this setting is for the algorithm to rapidly incorporate new observations into the model as they arrive, without keeping them in memory. We focus on the case where the underlying algorithm is ridge regression parameterized by a positive semidefinite matrix. We propose to learn this matrix by applying a stochastic strategy to minimize the empirical error incurred by ridge regression on future tasks sampled from the meta distribution. We study the statistical properties of the proposed algorithm and prove non-asymptotic bounds on its excess transfer risk, that is, the generalization performance on new tasks from the same meta distribution. We compare our online learning-to-learn approach with a state of the art batch method, both theoretically and empirically.

Paper Structure

This paper contains 28 sections, 19 theorems, 54 equations, 3 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

Let ${\cal X} \subseteq {\cal B}_1$, ${\cal Y} \subseteq [0,1]$ and $\ell$ be the square loss. Then, for any dataset $Z\in{\cal Z}^n$ the following properties hold:

Figures (3)

  • Figure 1: Relative improvement (in $\%$) of our online LTL algorithm over the ITL baseline for a varying range of training tasks and number of samples per task.
  • Figure 2: Performance of online LTL, batch LTL, ITL and MTL (on the test set) during one single trial of online learning on the synthetic dataset as the number of training tasks increases incrementally.
  • Figure 3: Percentage explained variance of online LTL, batch LTL, ITL and MTL (on the test set) during one single trial of online learning on the Schools dataset as the number of training tasks increases incrementally.

Theorems & Definitions (20)

  • Proposition 1: Properties of ${\cal L}_Z$ for the Square Loss
  • Theorem 2: Online LTL Bound
  • Proposition 2: Uniform Generalization Error Bound for Algorithm \ref{['alg:general-pssa']}
  • Lemma 2: Regret Bound for Algorithm \ref{['alg:general-pssa']}
  • Proposition 2: Excess Future Empirical Risk Bound for Algorithm \ref{['alg:general-pssa']}
  • Theorem 2: Batch LTL Bound
  • Lemma 3: Lemma 11 in maurer2005algorithmic
  • Theorem 4: Theorem 4 in maurer2009transfer
  • Theorem 5: Theorem 6 in maurer2009transfer
  • Theorem 6: Theorem 8 in maurer2009transfer
  • ...and 10 more