Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning

Chongyang Zhao; Dong Gong

Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning

Chongyang Zhao, Dong Gong

TL;DR

Problem: efficiently performing continual learning from non-stationary streams without storing all past representations. Approach: meta-learn a continual learner using a selective SSM (Mamba) and a selectivity regularization (MambaCL), enabling online sequence prediction with fixed-size hidden states. Key findings: MambaCL matches or surpasses Transformer-based baselines on diverse CL/MCL tasks while using fewer parameters and less computation, and shows robustness to long sequences, domain shifts, and noisy inputs. Significance: offers a memory-efficient, generalizable approach to continual adaptation suitable for resource-constrained deployment and real-world non-stationary data.

Abstract

Continual learning (CL) aims to efficiently learn from a non-stationary data stream, without storing or recomputing all seen samples. CL enables prediction on new tasks by incorporating sequential training samples. Building on this connection between CL and sequential modeling, meta-continual learning (MCL) aims to meta-learn an efficient continual learner as a sequence prediction model, with advanced sequence models like Transformers being natural choices. However, despite decent performance, Transformers rely on a linearly growing cache to store all past representations, conflicting with CL's objective of not storing all seen samples and limiting efficiency. In this paper, we focus on meta-learning sequence-prediction-based continual learners without retaining all past representations. While attention-free models with fixed-size hidden states (e.g., Linear Transformers) align with CL's essential goal and efficiency needs, they have shown limited effectiveness in MCL in previous literature. Given Mamba's strong sequence modeling performance and attention-free nature, we explore a key question: Can attention-free models like Mamba perform well on MCL? By formulating Mamba and the SSM for MCL tasks, we propose MambaCL, a meta-learned continual learner. To enhance MambaCL's training, we introduce selectivity regularization, leveraging the connection between Mamba and Transformers to guide its behavior over sequences. Furthermore, we study how Mamba and other models perform across various MCL scenarios through extensive and well-designed experiments. Our results highlight the promising performance and strong generalization of Mamba and attention-free models in MCL, demonstrating its potential for efficient continual learning and adaptation.

Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning

TL;DR

Abstract

Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (22)