Table of Contents
Fetching ...

Exemplar-Free Continual Learning for State Space Models

Isaac Ning Lee, Leila Mahmoodi, Trung Le, Mehrtash Harandi

TL;DR

This work introduces Inf-SSM, a geometry-aware regularization framework for exemplar-free continual learning with State-Space Models. By representing SSM dynamics with the extended observability subspace and measuring distances on the infinite Grassmannian, Inf-SSM constrains state evolution to preserve prior knowledge without storing past data. The authors derive an efficient O(n^2) solution to the Sylvester equations required for Gram-matrix computations, and achieve strong forgetting mitigation while maintaining accuracy across challenging benchmarks like ImageNet-R and Caltech-256. Across ablations and integrations with replay-based CL methods, Inf-SSM demonstrates robust performance gains and shows compatibility with existing continual learning strategies. The approach highlights the value of leveraging SSM geometry and infinite-horizon dynamics for scalable, exemplar-free continual learning.

Abstract

State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence, making them well-suited for sequence modeling. However, their evolving internal states pose challenges in adapting them under Continual Learning (CL). This is particularly difficult in exemplar-free settings, where the absence of prior data leaves updates to the dynamic SSM states unconstrained, resulting in catastrophic forgetting. To address this, we propose Inf-SSM, a novel and simple geometry-aware regularization method that utilizes the geometry of the infinite-dimensional Grassmannian to constrain state evolution during CL. Unlike classical continual learning methods that constrain weight updates, Inf-SSM regularizes the infinite-horizon evolution of SSMs encoded in their extended observability subspace. We show that enforcing this regularization requires solving a matrix equation known as the Sylvester equation, which typically incurs $\mathcal{O}(n^3)$ complexity. We develop a $\mathcal{O}(n^2)$ solution by exploiting the structure and properties of SSMs. This leads to an efficient regularization mechanism that can be seamlessly integrated into existing CL methods. Comprehensive experiments on challenging benchmarks, including ImageNet-R and Caltech-256, demonstrate a significant reduction in forgetting while improving accuracy across sequential tasks.

Exemplar-Free Continual Learning for State Space Models

TL;DR

This work introduces Inf-SSM, a geometry-aware regularization framework for exemplar-free continual learning with State-Space Models. By representing SSM dynamics with the extended observability subspace and measuring distances on the infinite Grassmannian, Inf-SSM constrains state evolution to preserve prior knowledge without storing past data. The authors derive an efficient O(n^2) solution to the Sylvester equations required for Gram-matrix computations, and achieve strong forgetting mitigation while maintaining accuracy across challenging benchmarks like ImageNet-R and Caltech-256. Across ablations and integrations with replay-based CL methods, Inf-SSM demonstrates robust performance gains and shows compatibility with existing continual learning strategies. The approach highlights the value of leveraging SSM geometry and infinite-horizon dynamics for scalable, exemplar-free continual learning.

Abstract

State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence, making them well-suited for sequence modeling. However, their evolving internal states pose challenges in adapting them under Continual Learning (CL). This is particularly difficult in exemplar-free settings, where the absence of prior data leaves updates to the dynamic SSM states unconstrained, resulting in catastrophic forgetting. To address this, we propose Inf-SSM, a novel and simple geometry-aware regularization method that utilizes the geometry of the infinite-dimensional Grassmannian to constrain state evolution during CL. Unlike classical continual learning methods that constrain weight updates, Inf-SSM regularizes the infinite-horizon evolution of SSMs encoded in their extended observability subspace. We show that enforcing this regularization requires solving a matrix equation known as the Sylvester equation, which typically incurs complexity. We develop a solution by exploiting the structure and properties of SSMs. This leads to an efficient regularization mechanism that can be seamlessly integrated into existing CL methods. Comprehensive experiments on challenging benchmarks, including ImageNet-R and Caltech-256, demonstrate a significant reduction in forgetting while improving accuracy across sequential tasks.

Paper Structure

This paper contains 48 sections, 9 theorems, 86 equations, 4 figures, 12 tables, 1 algorithm.

Key Result

Theorem 4

Let $({\mathbf{A}},{\mathbf{B}},{\mathbf{C}})$ and $({\mathbf{P}}{\mathbf{A}}{\mathbf{P}}^{-1},{\mathbf{P}}{\mathbf{B}},{\mathbf{C}}{\mathbf{P}}^{-1})$ be two equivalent representation for an SSM for ${\mathbf{P}} \in \mathrm{GL}(n)$. The subspace spanned by the extended observability matrices remai

Figures (4)

  • Figure 1: CL performance of Inf-SSM compared to previous works. AIA captures the overall performance across the entire sequence of the task, AA represents the final performance of the model, and FM evaluates the proportion of performance loss across task sequence. Inf-SSM outperforms prior CL methods (EwC kirkpatrick2017EWC, SI zenke2017SI, MAS aljundi2018MAS), (LwF rebuffi2017icarl_LwFMC) in all metrics.
  • Figure 2: SSM at each point on the sequence length $\tau$ is characterized by the infinite Observability subspace $O_{\infty}$, which consists of the tuple $(\tilde{{\mathbf{A}}}, \tilde{{\mathbf{C}}})$, visualized as a trajectory. The entire set of $O_{\infty}$ for each SSM is shown as a colored plane. Each trajectory has a corresponding point on the Grassmannian $d_{\mathcal{G}r}$. As we consider $(\tilde{{\mathbf{A}}}, \tilde{{\mathbf{C}}})$ to be applied at infinite horizon and pairwise distance on Grassmannian $d_{\mathcal{G}r}$ is computed, which is illustrated as the geodesic on the sphere representing the Grassmann manifold.
  • Figure 3: CKD analysis on Vim-small trained sequentially with ImageNet-R and CUB-200 over 10 tasks, EFCIL settings averaged across 24 Vim blocks.
  • Figure 4: CKD analysis on VIM-small with Imagenet-R and CUB-200 over 10 tasks, EFCIL settings for each of the 24 layers of SSM blocks.

Theorems & Definitions (14)

  • Remark 1
  • Definition 2: P-equivalence
  • Definition 3: Extended Observability
  • Theorem 4: Invariance of the $\mathcal{S}_{\infty}$ under P-equivalence
  • Lemma 5
  • Lemma 6
  • Definition 7: Principal Angles
  • Lemma 8: P-Equivalence doretto2003dynamic
  • Theorem 9: Invariance of the extended Observability under P-equivalence
  • Lemma 10
  • ...and 4 more