Incremental Learning-to-Learn with Statistical Guarantees
Giulia Denevi, Carlo Ciliberto, Dimitris Stamos, Massimiliano Pontil
TL;DR
The paper tackles learning-to-learn in an online, lifelong setting where tasks arrive sequentially from an unknown meta-distribution. It proposes an online meta-learning method that updates a ridge-based linear representation $D$ via projected stochastic subgradient steps, yielding non-asymptotic guarantees on excess transfer risk. A key contribution is the decomposition of risk into uniform generalization and excess future empirical risk, with bounds that match batch LTL up to constants while demanding far less memory and computation. Empirical results on synthetic data and the Schools dataset illustrate favorable generalization and notable speedups relative to batch LTL and ITL, supporting the practicality of the online approach. Overall, the work provides a principled, scalable framework for incremental meta-learning with provable guarantees, paving the way for extensions to broader losses and LTL algorithms.
Abstract
In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta distribution. In contrast to previous work on batch learning-to-learn, we consider a scenario where tasks are presented sequentially and the algorithm needs to adapt incrementally to improve its performance on future tasks. Key to this setting is for the algorithm to rapidly incorporate new observations into the model as they arrive, without keeping them in memory. We focus on the case where the underlying algorithm is ridge regression parameterized by a positive semidefinite matrix. We propose to learn this matrix by applying a stochastic strategy to minimize the empirical error incurred by ridge regression on future tasks sampled from the meta distribution. We study the statistical properties of the proposed algorithm and prove non-asymptotic bounds on its excess transfer risk, that is, the generalization performance on new tasks from the same meta distribution. We compare our online learning-to-learn approach with a state of the art batch method, both theoretically and empirically.
