Table of Contents
Fetching ...

IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning

Prashant Bhat, Bharath Renjith, Elahe Arani, Bahram Zonooz

TL;DR

Continual learning struggles with catastrophic forgetting, especially under limited replay buffers. IMEX-Reg introduces a two-pronged approach that combines implicit regularization via contrastive representation learning with explicit function-space regularization that aligns classifier geometry with the projection head, aided by EMA self-ensembling. The method yields substantial generalization gains, improved robustness to adversarial and natural corruptions, and reduced task-recency bias across Class-IL, Task-IL, and GCIL settings, with theoretical intuition grounded in the Johnson-Lindenstrauss lemma. This approach is particularly impactful for memory-constrained scenarios and edge deployments, where leveraging unlabeled data through CRL can further boost performance.

Abstract

Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge. Although rehearsal-based approaches have been fairly successful in mitigating catastrophic forgetting, they suffer from overfitting on buffered samples and prior information loss, hindering generalization under low-buffer regimes. Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes. Specifically, we employ a two-pronged implicit-explicit regularization approach using contrastive representation learning (CRL) and consistency regularization. To further leverage the global relationship between representations learned using CRL, we propose a regularization strategy to guide the classifier toward the activation correlations in the unit hypersphere of the CRL. Our results show that IMEX-Reg significantly improves generalization performance and outperforms rehearsal-based approaches in several CL scenarios. It is also robust to natural and adversarial corruptions with less task-recency bias. Additionally, we provide theoretical insights to support our design decisions further.

IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning

TL;DR

Continual learning struggles with catastrophic forgetting, especially under limited replay buffers. IMEX-Reg introduces a two-pronged approach that combines implicit regularization via contrastive representation learning with explicit function-space regularization that aligns classifier geometry with the projection head, aided by EMA self-ensembling. The method yields substantial generalization gains, improved robustness to adversarial and natural corruptions, and reduced task-recency bias across Class-IL, Task-IL, and GCIL settings, with theoretical intuition grounded in the Johnson-Lindenstrauss lemma. This approach is particularly impactful for memory-constrained scenarios and edge deployments, where leveraging unlabeled data through CRL can further boost performance.

Abstract

Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge. Although rehearsal-based approaches have been fairly successful in mitigating catastrophic forgetting, they suffer from overfitting on buffered samples and prior information loss, hindering generalization under low-buffer regimes. Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes. Specifically, we employ a two-pronged implicit-explicit regularization approach using contrastive representation learning (CRL) and consistency regularization. To further leverage the global relationship between representations learned using CRL, we propose a regularization strategy to guide the classifier toward the activation correlations in the unit hypersphere of the CRL. Our results show that IMEX-Reg significantly improves generalization performance and outperforms rehearsal-based approaches in several CL scenarios. It is also robust to natural and adversarial corruptions with less task-recency bias. Additionally, we provide theoretical insights to support our design decisions further.
Paper Structure (34 sections, 1 theorem, 11 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 34 sections, 1 theorem, 11 equations, 5 figures, 7 tables, 2 algorithms.

Key Result

Theorem 2

(Johnson-Lindenstrauss Lemma): Let $\epsilon \in (0,1)$ and $D_g > 0$ be such that for any integer $n$, $D_g \geq 4\left(\epsilon^{2} / 2-\epsilon^{3} / 3\right)^{-1} \ln n$. Then, for any set of points $Z \in \mathbb{R}^{D_h}$, there exists a mapping function $\mathcal{M}: \mathbb{R}^{D_h} \rightar

Figures (5)

  • Figure 1: Implicit-Explicit Regularization in CL: IMEX-Reg employs CRL ($\mathcal{L}_{rep}$) and consistency regularization ($\mathcal{L}^g_{cr}$ and $\mathcal{L}^h_{cr}$) to bias the learning towards generalization. To further leverage desirable traits of learning on unit-hypersphere using CRL, IMEX-Reg aligns the geometric structures within the classifier projection’s hypersphere with that of the projection head’s hypersphere ($\mathcal{L}_{ecr}$) thereby compensating for the weak supervision under low-buffer regimes.
  • Figure 2: Comparison of Stability-Plasticity Trade-off for different CL models across different datasets.
  • Figure 3: (Left) Robustness to PGD adversarial attack at varying strengths and (Right) Average probability of predicting each task for different CL methods trained on Seq-CIFAR100 with 5 tasks. IMEX-Reg shows the highest robustness and the least recency bias with probabilities evenly distributed across tasks.
  • Figure 4: Relative top-1 accuracy (%) (averaged over 5 severity levels) for 19 different natural corruptions for different CL models trained on Seq-CIFAR100 with 5 tasks. The average accuracy across all corruptions is shown as mCA.
  • Figure 5: Reliability diagrams with Expected Calibration Error (ECE) for CL methods trained on Seq-CIFAR100 with 5 tasks. The lower ECE value signifies a better calibrated model. Compared to baselines, IMEX-Reg is well-calibrated with the lowest ECE value.

Theorems & Definitions (2)

  • Conjecture 1
  • Theorem 2