Bridging Lifelong and Multi-Task Representation Learning via Algorithm and Complexity Measure
Zhi Wang, Chicheng Zhang, Ramya Korlakai Vinayak
TL;DR
The paper addresses lifelong representation learning where tasks arrive sequentially and share a common representation. It introduces a simple, practical algorithm that alternates few-shot property tests (to reuse the current representation) with memory-backed multi-task ERM (to refine the representation when needed), and defines the task-eluder dimension to bound how often refinement is necessary. The main theoretical contribution is a finite-time bound showing that with finite capacities and a finite $\dim(\mathcal{H},\mathcal{F},\epsilon)$, the algorithm achieves $\epsilon$-excess risk for all $T$ tasks, with representation updates bounded by $O(\dim(\mathcal{H},\mathcal{F},\epsilon))$ and explicit sample/memory complexity expressions. The framework unifies lifelong learning with MTL/LTL by treating multi-task ERM as a mechanism to refine representations online, and it demonstrates practicality through synthetic and semi-synthetic experiments across regression and classification with both linear and deep representations. Overall, the work provides a principled, general theory for online transfer of representations under noise, with concrete guidance for implementing lifelong learning systems that scale to modern feature extractors and datasets.
Abstract
In lifelong learning, a learner faces a sequence of tasks with shared structure and aims to identify and leverage it to accelerate learning. We study the setting where such structure is captured by a common representation of data. Unlike multi-task learning or learning-to-learn, where tasks are available upfront to learn the representation, lifelong learning requires the learner to make use of its existing knowledge while continually gathering partial information in an online fashion. In this paper, we consider a generalized framework of lifelong representation learning. We propose a simple algorithm that uses multi-task empirical risk minimization as a subroutine and establish a sample complexity bound based on a new notion we introduce--the task-eluder dimension. Our result applies to a wide range of learning problems involving general function classes. As concrete examples, we instantiate our result on classification and regression tasks under noise.
