Table of Contents
Fetching ...

LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning

Liangzu Peng, Juan Elenter, Joshua Agterberg, Alejandro Ribeiro, René Vidal

TL;DR

LoRanPAC addresses the instability of continual learning with lifted, high-dimensional random features by continually truncating the smallest SVD factors of the lifted feature matrix $H_{1:t}$. This yields a numerically stable, over-parameterized minimum-norm least-squares solution with theoretical guarantees: small training and test errors provided a suitable fraction of SVD components is retained, governed by the eigengap $\gamma_t$ and accumulative truncation error $a_t$. The method, which combines pre-trained features with a shallow random-feature head, scales to hundreds of tasks and outperforms state-of-the-art CL baselines like RanPAC across multiple datasets, while maintaining stability under Inc-1 scenarios. The contribution demonstrates how principled numerical techniques (continual SVD truncation) can be integrated with theory-informed bounds to produce robust, scalable continual learning systems in practical settings.

Abstract

The goal of continual learning (CL) is to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. Conversely, principled CL approaches often fail to achieve competitive performance. In this work, we aim to bridge this gap between theory and practice by designing a simple CL method that is theoretically sound and highly performant. Specifically, we lift pre-trained features into a higher dimensional space and formulate an over-parametrized minimum-norm least-squares problem. We find that the lifted features are highly ill-conditioned, potentially leading to large training errors (numerical instability) and increased generalization errors. We address these challenges by continually truncating the singular value decomposition of the lifted features. Our approach, termed LoRanPAC, is stable with respect to the choice of hyperparameters, can handle hundreds of tasks, and outperforms state-of-the-art CL methods on multiple datasets. Importantly, our method satisfies a recurrence relation throughout its continual learning process, which allows us to prove it maintains small training and test errors by appropriately truncating a fraction of SVD factors. This results in a stable continual learning method with strong empirical performance and theoretical guarantees. Code available: https://github.com/liangzu/loranpac.

LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning

TL;DR

LoRanPAC addresses the instability of continual learning with lifted, high-dimensional random features by continually truncating the smallest SVD factors of the lifted feature matrix . This yields a numerically stable, over-parameterized minimum-norm least-squares solution with theoretical guarantees: small training and test errors provided a suitable fraction of SVD components is retained, governed by the eigengap and accumulative truncation error . The method, which combines pre-trained features with a shallow random-feature head, scales to hundreds of tasks and outperforms state-of-the-art CL baselines like RanPAC across multiple datasets, while maintaining stability under Inc-1 scenarios. The contribution demonstrates how principled numerical techniques (continual SVD truncation) can be integrated with theory-informed bounds to produce robust, scalable continual learning systems in practical settings.

Abstract

The goal of continual learning (CL) is to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. Conversely, principled CL approaches often fail to achieve competitive performance. In this work, we aim to bridge this gap between theory and practice by designing a simple CL method that is theoretically sound and highly performant. Specifically, we lift pre-trained features into a higher dimensional space and formulate an over-parametrized minimum-norm least-squares problem. We find that the lifted features are highly ill-conditioned, potentially leading to large training errors (numerical instability) and increased generalization errors. We address these challenges by continually truncating the singular value decomposition of the lifted features. Our approach, termed LoRanPAC, is stable with respect to the choice of hyperparameters, can handle hundreds of tasks, and outperforms state-of-the-art CL methods on multiple datasets. Importantly, our method satisfies a recurrence relation throughout its continual learning process, which allows us to prove it maintains small training and test errors by appropriately truncating a fraction of SVD factors. This results in a stable continual learning method with strong empirical performance and theoretical guarantees. Code available: https://github.com/liangzu/loranpac.
Paper Structure (38 sections, 15 theorems, 82 equations, 18 figures, 16 tables, 5 algorithms)

This paper contains 38 sections, 15 theorems, 82 equations, 18 figures, 16 tables, 5 algorithms.

Key Result

Theorem 1

Let $\bm{B}_t, \gamma_t, a_t$ be defined as in eq:define-B-tilde, eq:define-gap-gamma, and eq:define-a respectively. If $\bm{Y}_{1:t} = \textnormal{$\bm{W}^*_t$} \bm{H}_{1:t} + \textnormal{$\mathcal{E}_{1:t}$}$eq:model-assumption, then the output $\widetilde{\bm{W}}_t= \bm{Y}_{1:t} \bm{H}_{1:t}^\top

Figures (18)

  • Figure 1: Spectrum of $\bm{H}_{1:t}^\top \bm{H}_{1:t}$ and its impact on training losses & test accuracy ($E=10^5$); see also \ref{['subsection:figures-data-analysis']}. The matrix $\bm{H}_{1:t}^\top \bm{H}_{1:t}$ is ill-conditioned (\ref{['fig:stability']}a); training loss increases (\ref{['fig:stability']}c) and test accuracy drops (\ref{['fig:stability']}d), drastically, when small eigenvalues (of order $10^{-5}$) invade the spectrum (\ref{['fig:stability']}b).
  • Figure 2: Final test accuracy as the truncation percentage $\zeta$ varies.
  • Figure 3: Training times (in minutes) for varying embedding dimensions $E$.
  • Figure 4: The average training MSE loss $\frac{1}{M_t} \| \bm{W} \bm{H}_{1:t} - \bm{Y}_{1:t}\|_{\textnormal{F}}^2$ of the incremental SVD solution to \ref{['eq:1layer-min-norm']} explodes when eigenvalues of order $10^{-5}$ emerge (\ref{['fig:stability']}b). LoRanPAC ($25\%$) truncates $25\%$ minimum singular values and implements \ref{['eq:LoRanPAC']} online, stabilizing \ref{['eq:1layer-min-norm']}.
  • Figure 5: Training times (in minutes) of LoRanPAC and RanPAC for different embedding dimensions $E$. See also \ref{['fig:training-time-inc5']} in the main paper for similar results on the other four datasets.
  • ...and 13 more figures

Theorems & Definitions (34)

  • Remark 1
  • Remark 2
  • Remark 3: Time and Space Complexity
  • Theorem 1
  • Theorem 2
  • Remark 4
  • Remark 5
  • Lemma 1
  • proof : Proof of \ref{['lemma:truncation-equality']}
  • Lemma 2
  • ...and 24 more