Degradation of Feature Space in Continual Learning

Chiara Lanza; Roberto Pereira; Marco Miozzo; Eduard Angelats; Paolo Dini

Degradation of Feature Space in Continual Learning

Chiara Lanza, Roberto Pereira, Marco Miozzo, Eduard Angelats, Paolo Dini

TL;DR

This work investigates whether enforcing isotropy in feature spaces benefits continual learning (CL). Using contrastive CL methods and isotropy metrics on CIFAR-10/100, it shows that isotropy regularization often harms downstream accuracy and that isotropy in CL does not reliably predict representation quality. The study introduces IsoEntropy alongside IsoScore and synthetic baselines to interpret high-dimensional isotropy, and evaluates several CL techniques (e.g., Co2L, SupCon, SupCP, NCI) with and without isotropy regularization. The findings caution against treating isotropy as a universal inductive bias in CL and highlight the need for geometry-aware regularizers that account for the non-stationary, sequential nature of continual tasks.

Abstract

Centralized training is the standard paradigm in deep learning, enabling models to learn from a unified dataset in a single location. In such setup, isotropic feature distributions naturally arise as a mean to support well-structured and generalizable representations. In contrast, continual learning operates on streaming and non-stationary data, and trains models incrementally, inherently facing the well-known plasticity-stability dilemma. In such settings, learning dynamics tends to yield increasingly anisotropic feature space. This arises a fundamental question: should isotropy be enforced to achieve a better balance between stability and plasticity, and thereby mitigate catastrophic forgetting? In this paper, we investigate whether promoting feature-space isotropy can enhance representation quality in continual learning. Through experiments using contrastive continual learning techniques on CIFAR-10 and CIFAR-100 data, we find that isotropic regularization fails to improve, and can in fact degrade, model accuracy in continual settings. Our results highlight essential differences in feature geometry between centralized and continual learning, suggesting that isotropy, while beneficial in centralized setups, may not constitute an appropriate inductive bias for non-stationary learning scenarios.

Degradation of Feature Space in Continual Learning

TL;DR

Abstract

Paper Structure (11 sections, 6 equations, 2 figures, 2 tables)

This paper contains 11 sections, 6 equations, 2 figures, 2 tables.

Introduction
Background and motivation
Feature Space Geometry
Measurements of Isotropy and features geometry
Synthetic Baselines for Isotropy Measurements
Numerical Evaluation
Learning scenarios
Learning techniques analyzed
Isotropy in the CL methods
Isotropy as regularization term
Conclusions

Figures (2)

Figure 1: t-SNE visualization for CIFAR-10 dataset with centralized learning and three different CL (CO²L) scenarios: $50+50$ (2 experiences of 5 classes each) $40+30+30$ (3 experiences of 4, 3, and 3 classes), $20\times5$ (5 experiences of 2 classes each).
Figure 2: Comparison of IsoEntropy and IsoScore levels for synthetic distributions and CL methods. For synthetic data, IsoEntropy and IsoScore are depicted using solid and dashed lines, respectively. For CL methods, solid and striped bars represent IsoEntropy and IsoScore, respectively.

Degradation of Feature Space in Continual Learning

TL;DR

Abstract

Degradation of Feature Space in Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)