Table of Contents
Fetching ...

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

Depeng Li, Tianqi Wang, Junwei Chen, Qining Ren, Kenji Kawaguchi, Zhigang Zeng

TL;DR

This paper tackles catastrophic forgetting under a strict continual learning setting that forbids access to past data and minimizes model expansion. It introduces CLDNet, a framework that combines HSIC-Bottleneck Orthogonalization (HBO) for non-overwriting, dependency-aware updates with EquiAngular Embedding (EAE) for parameter-free, prototype-based decision boundaries. HBO minimizes dependence on inputs while maximizing dependence on outputs across layers, using a recursive orthogonal projector to regulate gradient updates. EAE replaces traditional classifiers with Equiangular Basis Vectors (EBVs), aligning representations to fixed class prototypes and enabling scalable, discriminative boundaries. Together, HBO and EAE deliver competitive accuracy without replay or growth, demonstrating strong stability-plasticity trade-offs in memory- and privacy-constrained continual learning scenarios.

Abstract

Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

TL;DR

This paper tackles catastrophic forgetting under a strict continual learning setting that forbids access to past data and minimizes model expansion. It introduces CLDNet, a framework that combines HSIC-Bottleneck Orthogonalization (HBO) for non-overwriting, dependency-aware updates with EquiAngular Embedding (EAE) for parameter-free, prototype-based decision boundaries. HBO minimizes dependence on inputs while maximizing dependence on outputs across layers, using a recursive orthogonal projector to regulate gradient updates. EAE replaces traditional classifiers with Equiangular Basis Vectors (EBVs), aligning representations to fixed class prototypes and enabling scalable, discriminative boundaries. Together, HBO and EAE deliver competitive accuracy without replay or growth, demonstrating strong stability-plasticity trade-offs in memory- and privacy-constrained continual learning scenarios.

Abstract

Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
Paper Structure (13 sections, 9 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 9 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison between our method and representative CL approaches. (a) Rehearsal-based ones are often sensitive to buffer sizes. (b) Some architecture-based ones scale rapidly during sequential training. (c) Most regularization-based ones struggle with the stability-plasticity dilemma whose performance is not satisfactory in the class-IL (hybrid with (a) or/and (b) excluded). By contrast, our method reaches multiple CL desiderata simultaneously.
  • Figure 2: Overview of CLDNet. HBO transforms learning task $t$ into a constrained statistical dependency mini-max problem and EAE predicts by matching class-specific basis vectors. Systematically, the last-layer hidden representation $Z_L$ is bound to any one of the available basis vectors in grey for recognizing a new class. We mark this process in red.
  • Figure 3: Changes of the rank of orthogonal projectors.
  • Figure 4: t-SNE visualization based on split FashionMNIST. Each color represents a class. We visualize two classes in each task as a session. (a)-(e) represents the corresponding representation visualization of classes trained so far.