Table of Contents
Fetching ...

Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Sihao Liu, Yibo Yang, Xiaojie Li, David A. Clifton, Bernard Ghanem

TL;DR

A plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability, and design a class-conditional routing algorithm for dynamic, uncertainty-based adjustment and implement a contrastive discretization loss to optimize it.

Abstract

Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks. Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation. However, they often overlook the adaptability of the model, limiting the ability to learn generalizable and discriminative features incrementally from online training data. To address this, we introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability. Specifically, S6MOD introduces an extra branch after the backbone, where a mixture of discretization selectively adjusts parameters in a selective state space model, enriching selective scan patterns such that the model can adaptively select the most sensitive discretization method for current dynamics. We further design a class-conditional routing algorithm for dynamic, uncertainty-based adjustment and implement a contrastive discretization loss to optimize it. Extensive experiments combining our module with various models demonstrate that S6MOD significantly enhances model adaptability, leading to substantial performance gains and achieving the state-of-the-art results.

Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

TL;DR

A plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability, and design a class-conditional routing algorithm for dynamic, uncertainty-based adjustment and implement a contrastive discretization loss to optimize it.

Abstract

Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks. Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation. However, they often overlook the adaptability of the model, limiting the ability to learn generalizable and discriminative features incrementally from online training data. To address this, we introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability. Specifically, S6MOD introduces an extra branch after the backbone, where a mixture of discretization selectively adjusts parameters in a selective state space model, enriching selective scan patterns such that the model can adaptively select the most sensitive discretization method for current dynamics. We further design a class-conditional routing algorithm for dynamic, uncertainty-based adjustment and implement a contrastive discretization loss to optimize it. Extensive experiments combining our module with various models demonstrate that S6MOD significantly enhances model adaptability, leading to substantial performance gains and achieving the state-of-the-art results.

Paper Structure

This paper contains 37 sections, 18 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Framework of S6MOD. Our method (a) introduces a plug-and-play branch after the backbone, where features are learned through S6MOD and supervised by the ETF classifier to guide the base method. S6MOD (c) utilizes MoE to enhance the discretization of SSM and applies class-conditional routing (b) to dynamically adjust the discretization based on the uncertainty. Finally, we use a contrastive discretization loss (d) to supervise the learning of both generalizable and discriminative features.
  • Figure 2: t-SNE visualization of memory data at the end of training on CIFAR-100 ($M=2k$), showcasing baseline in (a) and (c) and baseline combined with S6MOD in (b) and (d). Different colors represent different classes.
  • Figure 3: Impact of dynamically selecting different $N_k$ values on the ability to learn new tasks and prevent forgetting of old tasks: New-Task accuracy represents the model's accuracy on the current task. We conduct experiments by setting $N_k = 1$, $N_k = N$, and calculating $N_k$ through class-conditional routing. The dataset used is CIFAR-100 ($M=2k$).
  • Figure 4: Impact of pattern number $N$. The dataset used is CIFAR-100 ($M=2k$).
  • Figure 5: T-SNE visualization of features before classification of memory data at the end of training on CIFAR-100 ($M=2k$).