Table of Contents
Fetching ...

Latent Spectral Regularization for Continual Learning

Emanuele Frascaroli, Riccardo Benaglia, Matteo Boschini, Luca Moschella, Cosimo Fiorini, Emanuele Rodolà, Simone Calderara

TL;DR

This work tackles catastrophic forgetting in rehearsal-based continual learning by analyzing how latent-space geometry evolves with sequential tasks. It reveals that replayed points from different classes increasingly mix in the latent space and proposes CaSpeR-IL, a spectral-geometry regularizer that promotes partitioned embeddings by shaping the Laplacian spectrum and maximizing the eigengap $\lambda_{g+1}-\lambda_g$. By integrating a loss term $\ell_{\text{CaSpeR}}=-\lambda_{g+1}+\sum_{j=1}^g\lambda_j$ (with a Monte Carlo approximation $\ell_{\text{CaSpeR}}^*$ for efficiency), CaSpeR-IL can be plugged into any rehearsal-based CL method and consistently improves final accuracy $\bar{A}_F$ while reducing forgetting $\bar{F}^*_F$ across multiple benchmarks. The analysis shows CaSpeR-IL produces more stable latent-space partitions, as evidenced by more diagonal functional maps and lower off-diagonal energy, indicating reduced interference between classes and enhanced transfer of knowledge across tasks.

Abstract

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric characteristics of the learner's latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. Our proposal, called Continual Spectral Regularizer for Incremental Learning (CaSpeR-IL), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks.

Latent Spectral Regularization for Continual Learning

TL;DR

This work tackles catastrophic forgetting in rehearsal-based continual learning by analyzing how latent-space geometry evolves with sequential tasks. It reveals that replayed points from different classes increasingly mix in the latent space and proposes CaSpeR-IL, a spectral-geometry regularizer that promotes partitioned embeddings by shaping the Laplacian spectrum and maximizing the eigengap . By integrating a loss term (with a Monte Carlo approximation for efficiency), CaSpeR-IL can be plugged into any rehearsal-based CL method and consistently improves final accuracy while reducing forgetting across multiple benchmarks. The analysis shows CaSpeR-IL produces more stable latent-space partitions, as evidenced by more diagonal functional maps and lower off-diagonal energy, indicating reduced interference between classes and enhanced transfer of knowledge across tasks.

Abstract

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric characteristics of the learner's latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. Our proposal, called Continual Spectral Regularizer for Incremental Learning (CaSpeR-IL), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks.
Paper Structure (29 sections, 7 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 7 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of the proposed CaSpeR-IL regularizer. Rehearsal-based CL methods struggle to separate the latent-space projections of replay data points. Our proposal acts on the spectrum of the latent geometry graph to induce a partitioning behavior by maximizing the eigengap for the number of seen classes (best seen in color).
  • Figure 2: How CL alters a model's latent space. (a) A quantitative evaluation measured as Label-Signal Variation ($\sigma$) within the LGG for buffer data points -- lower is better; (b) TSNE embedding of the features computed by X-DER for buffered examples in later tasks (top). Interference between classes is visibly reduced if CaSpeR-IL is applied (bottom). All experiments are carried out on Split CIFAR-100, (a) uses buffer size $500$, (b) uses $2000$ (best seen in colors).
  • Figure 3: For several rehearsal methods with and without CaSpeR-IL, the functional map magnitude matrices $\boldsymbol{C^{|\cdot|}}$ between the LGGs $\mathcal{G}^{\tau_5}$ and $\mathcal{G}^{\tau_{10}}$, computed on the test set of $\tau_1,...,\tau_5$ after training up to $\tau_5$ and $\tau_{10}$ respectively (Split CIFAR-100 - buffer size $2000$). The closer $\boldsymbol{C^{|\cdot|}}$ to the diagonal, the less geometric distortion between $\mathcal{G}^{\tau_5}$ and $\mathcal{G}^{\tau_{10}}$. We report the first $25$ rows and columns of $\boldsymbol{C^{|\cdot|}}$, focusing on low-frequency correspondences ovsjanikov2012functional, and apply a $\boldsymbol{C^{|\cdot|}} > 0.15$ threshold to increase clarity.
  • Figure D: Wallclock training time for the approaches evaluated in Sec. \ref{['sec:exps']} benchmarked on an identical hardware setup (GPU NVIDIA V100). Training time grows linearly w.r.t. the number of per-batch forward steps.