Exemplar-Free Continual Learning for State Space Models
Isaac Ning Lee, Leila Mahmoodi, Trung Le, Mehrtash Harandi
TL;DR
This work introduces Inf-SSM, a geometry-aware regularization framework for exemplar-free continual learning with State-Space Models. By representing SSM dynamics with the extended observability subspace and measuring distances on the infinite Grassmannian, Inf-SSM constrains state evolution to preserve prior knowledge without storing past data. The authors derive an efficient O(n^2) solution to the Sylvester equations required for Gram-matrix computations, and achieve strong forgetting mitigation while maintaining accuracy across challenging benchmarks like ImageNet-R and Caltech-256. Across ablations and integrations with replay-based CL methods, Inf-SSM demonstrates robust performance gains and shows compatibility with existing continual learning strategies. The approach highlights the value of leveraging SSM geometry and infinite-horizon dynamics for scalable, exemplar-free continual learning.
Abstract
State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence, making them well-suited for sequence modeling. However, their evolving internal states pose challenges in adapting them under Continual Learning (CL). This is particularly difficult in exemplar-free settings, where the absence of prior data leaves updates to the dynamic SSM states unconstrained, resulting in catastrophic forgetting. To address this, we propose Inf-SSM, a novel and simple geometry-aware regularization method that utilizes the geometry of the infinite-dimensional Grassmannian to constrain state evolution during CL. Unlike classical continual learning methods that constrain weight updates, Inf-SSM regularizes the infinite-horizon evolution of SSMs encoded in their extended observability subspace. We show that enforcing this regularization requires solving a matrix equation known as the Sylvester equation, which typically incurs $\mathcal{O}(n^3)$ complexity. We develop a $\mathcal{O}(n^2)$ solution by exploiting the structure and properties of SSMs. This leads to an efficient regularization mechanism that can be seamlessly integrated into existing CL methods. Comprehensive experiments on challenging benchmarks, including ImageNet-R and Caltech-256, demonstrate a significant reduction in forgetting while improving accuracy across sequential tasks.
