Table of Contents
Fetching ...

Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities

JinLi He, Liang Bai, Xian Yang

TL;DR

This work forms rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics and derives a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale.

Abstract

Rehearsal is one of the key techniques for mitigating catastrophic forgetting and has been widely adopted in continual learning algorithms due to its simplicity and practicality. However, the theoretical understanding of how rehearsal scale influences learning dynamics remains limited. To address this gap, we formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics. Within this framework, we derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale. Our results uncover several intriguing and counterintuitive findings. First, rehearsal can impair model's adaptability, in sharp contrast to its traditionally recognized benefits. Second, increasing the rehearsal scale does not necessarily improve memory retention. When tasks are similar and noise levels are low, the memory error exhibits a diminishing lower bound. Finally, we validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets, revealing statistical patterns of rehearsal mechanisms in continual learning.

Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities

TL;DR

This work forms rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics and derives a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale.

Abstract

Rehearsal is one of the key techniques for mitigating catastrophic forgetting and has been widely adopted in continual learning algorithms due to its simplicity and practicality. However, the theoretical understanding of how rehearsal scale influences learning dynamics remains limited. To address this gap, we formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics. Within this framework, we derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale. Our results uncover several intriguing and counterintuitive findings. First, rehearsal can impair model's adaptability, in sharp contrast to its traditionally recognized benefits. Second, increasing the rehearsal scale does not necessarily improve memory retention. When tasks are similar and noise levels are low, the memory error exhibits a diminishing lower bound. Finally, we validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets, revealing statistical patterns of rehearsal mechanisms in continual learning.
Paper Structure (6 sections, 4 theorems, 12 equations, 7 figures, 4 tables)

This paper contains 6 sections, 4 theorems, 12 equations, 7 figures, 4 tables.

Key Result

Theorem 4.1

Suppose that Assumption assumption1-assumption2 hold. Then the adaptation error of the rehearsal-based continual learning is formally given by: under the overparameterized regime, there have under the underparameterized regime, there have where $\lambda := \frac{p-n-s}{p}$ and $a_{noise}:=\frac{(1-\lambda^{T})p\sigma^{2}}{(p-n-s-1)}$, with larger $\lambda$ indicating greater overparameterization

Figures (7)

  • Figure 1: An ideal continual learning system should strike a delicate balance among the adaptation of newly acquired knowledge, the memorization of previously learned knowledge, and the generalization of unseen data distributions across diverse scenarios.
  • Figure 2: Adaptation error of rehearsal-based methods under different setups, where $T=8$, $n=1000$ and $\|\boldsymbol w_t^{*}\|^2=1$ for all $t\in T$. Subfigure settings: (a) $s=500$; (b) $\sigma=0.02$; (c) $s=500$, $\sigma=0.02$. Discrete points denote averages across runs.
  • Figure 3: Memory error of rehearsal-based methods under different setups, where $T=8$, $n=1000$ and $\|\boldsymbol w_t^{*}\|^2=1$ for all $t\in T$. Subfigure settings: (a) $s=500$; (b) $\sigma=0.02$; (c) $s=500$, $\sigma=0.02$. Discrete points denote averages over simulation runs.
  • Figure 4: Generalization error w.r.t. the number of model parameters or rehearsal samples, with $T=8$, $n=1000$ and $\|\boldsymbol w_t^{*}\|^2=1$. Subfigure settings: (a) $s=500$; (b) $\sigma=0.02$; (c) $s=500$, $\sigma=0.02$. Discrete points denote averages across runs.
  • Figure 5: Adaptation error on Tiny-ImageNet with increasing training classes, with the legend showing varying buffer sizes.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Remark 3.2
  • Remark 3.4
  • Theorem 4.1: Adaptation error
  • Remark 4.2
  • Theorem 4.3: Memory error
  • Theorem 4.4: Generalization error
  • Proposition 4.5