Degradation of Feature Space in Continual Learning
Chiara Lanza, Roberto Pereira, Marco Miozzo, Eduard Angelats, Paolo Dini
TL;DR
This work investigates whether enforcing isotropy in feature spaces benefits continual learning (CL). Using contrastive CL methods and isotropy metrics on CIFAR-10/100, it shows that isotropy regularization often harms downstream accuracy and that isotropy in CL does not reliably predict representation quality. The study introduces IsoEntropy alongside IsoScore and synthetic baselines to interpret high-dimensional isotropy, and evaluates several CL techniques (e.g., Co2L, SupCon, SupCP, NCI) with and without isotropy regularization. The findings caution against treating isotropy as a universal inductive bias in CL and highlight the need for geometry-aware regularizers that account for the non-stationary, sequential nature of continual tasks.
Abstract
Centralized training is the standard paradigm in deep learning, enabling models to learn from a unified dataset in a single location. In such setup, isotropic feature distributions naturally arise as a mean to support well-structured and generalizable representations. In contrast, continual learning operates on streaming and non-stationary data, and trains models incrementally, inherently facing the well-known plasticity-stability dilemma. In such settings, learning dynamics tends to yield increasingly anisotropic feature space. This arises a fundamental question: should isotropy be enforced to achieve a better balance between stability and plasticity, and thereby mitigate catastrophic forgetting? In this paper, we investigate whether promoting feature-space isotropy can enhance representation quality in continual learning. Through experiments using contrastive continual learning techniques on CIFAR-10 and CIFAR-100 data, we find that isotropic regularization fails to improve, and can in fact degrade, model accuracy in continual settings. Our results highlight essential differences in feature geometry between centralized and continual learning, suggesting that isotropy, while beneficial in centralized setups, may not constitute an appropriate inductive bias for non-stationary learning scenarios.
