Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

Simone Magistri; Tomaso Trinci; Albin Soutif-Cormerais; Joost van de Weijer; Andrew D. Bagdanov

Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

Simone Magistri, Tomaso Trinci, Albin Soutif-Cormerais, Joost van de Weijer, Andrew D. Bagdanov

TL;DR

Elastic Feature Consolidation addresses Cold Start Exemplar-Free Class Incremental Learning by introducing a non-isotropic, second-order regularization in feature space through the Empirical Feature Matrix ($E_t$), which concentrates regularization on directions most impactful to previous tasks. It couples this with an asymmetric Prototype Rehearsal loss (PR-ACE) and drift-aware prototype updates to preserve backbone plasticity while mitigating forgetting, using Gaussian prototypes to balance learning across seen classes. The key contributions are the analytic formulation of the Empirical Feature Matrix, its use as a feature-space pseudo-metric for regularization, and the integration of EFM-guided prototype drift compensation within an asymmetric rehearsal framework. Empirical results on CIFAR-100, Tiny-ImageNet, and ImageNet-Subset show that EFC outperforms state-of-the-art methods in both Warm Start and especially Cold Start scenarios, achieving stronger plasticity with competitive or reduced storage costs. This approach provides a privacy-preserving, exemplar-free pathway to robust continual learning with practical implications for real-world sequence learning under data constraints.

Abstract

Exemplar-Free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, which results in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose a simple and effective approach that consolidates feature representations by regularizing drift in directions highly relevant to previous tasks and employs prototypes to reduce task-recency bias. Our method, called Elastic Feature Consolidation (EFC), exploits a tractable second-order approximation of feature drift based on an Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes used in a novel asymmetric cross entropy loss which effectively balances prototype rehearsal with data from new tasks. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset and ImageNet-1K demonstrate that Elastic Feature Consolidation is better able to learn new tasks by maintaining model plasticity and significantly outperform the state-of-the-art.

Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

TL;DR

), which concentrates regularization on directions most impactful to previous tasks. It couples this with an asymmetric Prototype Rehearsal loss (PR-ACE) and drift-aware prototype updates to preserve backbone plasticity while mitigating forgetting, using Gaussian prototypes to balance learning across seen classes. The key contributions are the analytic formulation of the Empirical Feature Matrix, its use as a feature-space pseudo-metric for regularization, and the integration of EFM-guided prototype drift compensation within an asymmetric rehearsal framework. Empirical results on CIFAR-100, Tiny-ImageNet, and ImageNet-Subset show that EFC outperforms state-of-the-art methods in both Warm Start and especially Cold Start scenarios, achieving stronger plasticity with competitive or reduced storage costs. This approach provides a privacy-preserving, exemplar-free pathway to robust continual learning with practical implications for real-world sequence learning under data constraints.

Abstract

Paper Structure (32 sections, 25 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 32 sections, 25 equations, 11 figures, 7 tables, 1 algorithm.

Introduction
Related Work
EFCIL Regularization via the Empirical Feature Matrix
Exemplar-free Class-Incremental Learning (EFCIL)
Weight Regularization and Feature Distillation
The Empirical Feature Matrix
Prototype Rehearsal for Elastic Feature Consolidation
Prototype Rehearsal for Exemplar-free Class-Incremental Learning
Asymmetric Prototype Rehearsal
Prototype Drift Compensation via EFM
Elastic Feature Consolidation with Asymmetric Prototype Rehearsal
Experimental results
Datasets, Metrics, and Hyperparameters
Comparison with the state-of-the-art
Ablation Study and Storage Costs
...and 17 more sections

Figures (11)

Figure 1: Elastic Feature Consolidation. (a) Architecture overview; (b) The Empirical Feature Matrix (EFM) measures how outputs vary with features and identifies important directions to mitigate forgetting (Section \ref{['sec:empirical_feat']}); (c) The EFM induces a pseudo-metric in feature space used to estimate prototype drift (Section \ref{['sec:drift-compensation']}); and (d) The Asymmetric Prototype Replay loss adapts previous task classifiers to the changing backbone by balancing new-task data and Gaussian prototypes (Section \ref{['sec:asym_loss']}).
Figure 2: The regularizing effects of $E_t$ on Cold Start CIFAR-100 - 10 step (see Section \ref{['sec:dataset']} for more details on the dataset settings). Left: Perturbing features in principal directions of $E_1$ results in significant changes in classifier outputs (in blue), while perturbations in non-principal directions leave the outputs unchanged (in red). Right: If we continue incremental learning up through task 3 and perturb features from all three tasks in the principal (solid lines) and non-principal (dashed lines) directions of $E_3$, we see that $E_3$ captures all important directions in feature space up through task 3.
Figure 3: Accuracy after each incremental step on the Cold Start CIFAR-100 10-step scenario. Feature distillation with symmetric prototype loss (left) reduces forgetting at the cost of plasticity and new tasks are not learned. EFM regularization with symmetric loss (middle) increases plasticity at the cost of stability and previous tasks are forgotten. EFM regularization with asymmetric PR-ACE loss (right), balances current task data with prototypes and achieves better plasticity/stability trade-off.
Figure 4: Ablations on (a) losses and (b) prototype update for CIFAR-100.
Figure A1: The spectrum of the Empirical Feature Matrix across incremental learning steps. For a better visualization of the spectrum in the analysis we considered a Warm Start 5-step scenario on CIFAR-100. The $x$-axis is truncated at the $120$th eigenvalue.
...and 6 more figures

Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

TL;DR

Abstract

Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)