Table of Contents
Fetching ...

Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning

Depeng Li, Tianqi Wang, Junwei Chen, Wei Dai, Zhigang Zeng

TL;DR

This work tackles catastrophic forgetting in class-incremental learning by introducing AutoActivator, a network that dynamically grows only as needed per task under a supervisory mechanism. New neural units are recruited to maximize residual-error reduction, while an activation-threshold scheme enables task-agnostic reactivation during inference to prevent interference. The approach is underpinned by a universal approximation theorem guaranteeing convergence over sequential tasks and offers rehearsal-free, scalable expansion across diverse backbones and datasets. Empirical results across MNIST, FashionMNIST, CIFAR-100, and ImageNet-R show competitive or superior accuracy with minimal memory growth and zero or minimal forgetting, highlighting practical utility for privacy-sensitive and resource-constrained settings.

Abstract

Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones. In this paper, we propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL. In each training session, it introduces a supervisory mechanism to guide network expansion whose growth size is compactly commensurate with the intrinsic complexity of a newly arriving task. This constructs a near-minimal network while allowing the model to expand its capacity when cannot sufficiently hold new classes. At inference time, it automatically reactivates the required neural units to retrieve knowledge and leaves the remaining inactivated to prevent interference. We name our model AutoActivator, which is effective and scalable. To gain insights into the neural unit dynamics, we theoretically analyze the model's convergence property via a universal approximation theorem on learning sequential mappings, which is under-explored in the CIL community. Experiments show that our method achieves strong CIL performance in rehearsal-free and minimal-expansion settings with different backbones.

Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning

TL;DR

This work tackles catastrophic forgetting in class-incremental learning by introducing AutoActivator, a network that dynamically grows only as needed per task under a supervisory mechanism. New neural units are recruited to maximize residual-error reduction, while an activation-threshold scheme enables task-agnostic reactivation during inference to prevent interference. The approach is underpinned by a universal approximation theorem guaranteeing convergence over sequential tasks and offers rehearsal-free, scalable expansion across diverse backbones and datasets. Empirical results across MNIST, FashionMNIST, CIFAR-100, and ImageNet-R show competitive or superior accuracy with minimal memory growth and zero or minimal forgetting, highlighting practical utility for privacy-sensitive and resource-constrained settings.

Abstract

Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones. In this paper, we propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL. In each training session, it introduces a supervisory mechanism to guide network expansion whose growth size is compactly commensurate with the intrinsic complexity of a newly arriving task. This constructs a near-minimal network while allowing the model to expand its capacity when cannot sufficiently hold new classes. At inference time, it automatically reactivates the required neural units to retrieve knowledge and leaves the remaining inactivated to prevent interference. We name our model AutoActivator, which is effective and scalable. To gain insights into the neural unit dynamics, we theoretically analyze the model's convergence property via a universal approximation theorem on learning sequential mappings, which is under-explored in the CIL community. Experiments show that our method achieves strong CIL performance in rehearsal-free and minimal-expansion settings with different backbones.
Paper Structure (24 sections, 2 theorems, 20 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 20 equations, 5 figures, 10 tables, 1 algorithm.

Key Result

Proposition 4.1

kwok1997objective Let $\Gamma$ be a set of basis functions $g$. For a fixed $g \in \Gamma$$(\Vert g \Vert \neq 0)$, the expression $\Vert f - ( f_{L-1} + \beta_L g_L)\Vert$ achieve its minimum iff

Figures (5)

  • Figure 1: Overview of AutoActivator. It first generates several batches of random nodes, denoted by different shapes; Then, together with existing ones for knowledge transfer, it parsimoniously recruits new nodes meeting the supervisory mechanism (e.g., in red circles) to a scalable neural unit, where those joined ones are positively influenced by each other as marked by black arrows. In AutoActivator, the former layers are built under the guidance of supervisory mechanism (Section \ref{['Sec_Supervisory_Mechanisms']}) while the final classifier layer is step-wise updated by close-formed solutions (Section \ref{['Sec_Reactivation']}). The activation thresholds render a neural unit partially/entirely active or inactivated for prediction.
  • Figure 2: Parameter analysis of supervisory mechanisms. We report the ACA, the cumulative number of nodes (Nodes), and the whole running time (Time) per task sequence under different $l$ and $T_{max}$.
  • Figure 3: t-SNE visualization where each color represents a class. (a) Mixed raw sample space based on the training data of ten classes as a reference. (b)-(f) Well-clustered representation space based on neural units' outputs after learning two classes per session.
  • Figure 4: Performance comparison on the intra-sequence imbalanced case. Left: CIFAR-{100(10), 100(20), 100(30), 100(40)}; Right: CIFAR-{100(2), 100(4), 100(6), $\dots$}.
  • Figure 5: Growth of the network as the number of tasks increases for CIFAR-100/10.

Theorems & Definitions (5)

  • Proposition 4.1
  • Theorem 4.2
  • proof
  • Remark 4.3
  • proof