Table of Contents
Fetching ...

EFC++: Elastic Feature Consolidation with Prototype Re-balancing for Cold Start Exemplar-free Incremental Learning

Simone Magistri, Tomaso Trinci, Albin Soutif-Cormerais, Joost van de Weijer, Andrew D. Bagdanov

TL;DR

This work tackles Exemplar-free Class Incremental Learning (EFCIL) under Cold Start by introducing Elastic Feature Consolidation++ (EFC++), a method that regularizes feature drift in directions most relevant to past tasks using the Empirical Feature Matrix (EFM) and decouples backbone learning from classifier calibration via a post-training Prototype Re-balancing phase. EFM provides a tractable second-order approximation of feature drift, enabling selective stabilization while preserving plasticity, and prototypes are updated and used in a re-balancing training step to mitigate inter-task confusion without compromising adaptation to new tasks. Across small-scale, large-scale, and domain-incremental benchmarks, EFC++ consistently outperforms prior exemplar-free approaches, with particularly strong gains in Cold Start scenarios and competitive or superior performance in Warm Start settings. The approach balances stability and plasticity, reduces feature-space drift, and offers practical training-time and memory requirements, making it a promising tool for non-exemplar continual learning in dynamic, privacy-sensitive environments.

Abstract

Exemplar-free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, resulting in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose an effective approach to consolidate feature representations by regularizing drift in directions highly relevant to previous tasks while employing prototypes to reduce task-recency bias. Our approach, which we call Elastic Feature Consolidation++ (EFC++) exploits a tractable second-order approximation of feature drift based on a proposed Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes. In addition, we introduce a post-training prototype re-balancing phase that updates classifiers to compensate for feature drift. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset, ImageNet-1K and DomainNet demonstrate that EFC++ is better able to learn new tasks by maintaining model plasticity and significantly outperforms the state-of-the-art.

EFC++: Elastic Feature Consolidation with Prototype Re-balancing for Cold Start Exemplar-free Incremental Learning

TL;DR

This work tackles Exemplar-free Class Incremental Learning (EFCIL) under Cold Start by introducing Elastic Feature Consolidation++ (EFC++), a method that regularizes feature drift in directions most relevant to past tasks using the Empirical Feature Matrix (EFM) and decouples backbone learning from classifier calibration via a post-training Prototype Re-balancing phase. EFM provides a tractable second-order approximation of feature drift, enabling selective stabilization while preserving plasticity, and prototypes are updated and used in a re-balancing training step to mitigate inter-task confusion without compromising adaptation to new tasks. Across small-scale, large-scale, and domain-incremental benchmarks, EFC++ consistently outperforms prior exemplar-free approaches, with particularly strong gains in Cold Start scenarios and competitive or superior performance in Warm Start settings. The approach balances stability and plasticity, reduces feature-space drift, and offers practical training-time and memory requirements, making it a promising tool for non-exemplar continual learning in dynamic, privacy-sensitive environments.

Abstract

Exemplar-free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, resulting in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose an effective approach to consolidate feature representations by regularizing drift in directions highly relevant to previous tasks while employing prototypes to reduce task-recency bias. Our approach, which we call Elastic Feature Consolidation++ (EFC++) exploits a tractable second-order approximation of feature drift based on a proposed Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes. In addition, we introduce a post-training prototype re-balancing phase that updates classifiers to compensate for feature drift. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset, ImageNet-1K and DomainNet demonstrate that EFC++ is better able to learn new tasks by maintaining model plasticity and significantly outperforms the state-of-the-art.

Paper Structure

This paper contains 40 sections, 30 equations, 17 figures, 10 tables, 2 algorithms.

Figures (17)

  • Figure 1: Plasticity potential in Cold and Warm Start. We train a ResNet-18 on $\mathcal{C}_0$ classes of CIFAR-100 for $\mathcal{C}_0 = 10, 20, \dots, 50$, and evaluate feature quality via linear probing davari2022probing on all 100 classes. The plasticity potential $\Delta$, defined as the maximum performance gain which can be obtained with a plastic versus a frozen backbone, is quantified as the performance gap between Joint Training and a model frozen on a subset of classes. Note that $\Delta_{\text{Cold}}$ (the Cold Start scenario) is significantly larger than $\Delta_{\text{Warm}}$ (the Warm Start case). This is due to the inability to learn a strong feature extractor on only $\mathcal{C}_0$ classes and hence greater plasticity is required to incrementally learn new classes. Freezing the backbone in such settings limits adaptability and ultimately constrains performance on subsequent tasks.
  • Figure 2: Elastic Feature Consolidation with Prototype re-balancing (EFC++). (a) EFC++ leverages the Empirical Feature Matrix (EFM) to mitigate drift in feature representations by identifying important directions for previous tasks to reduce forgetting while enhancing plasticity for learning new tasks (Section \ref{['sec:empirical_feat']}). In this phase, the feature extractor $f_t$ and the current task classifier with weights $W_{t-1:t}$ are trained with EFM regularization and cross-entropy loss (Section \ref{['sec:proto-balance']}) (b) After training, EFC++ uses the EFM to update the prototypes of previous task classes based on the drift induced by the most recent task (Section \ref{['sec:drift-compensation']}). (c) EFC++ uses Gaussian prototypes, together current task features, for training previous and current task classifiers with weights $W_t = [W_{t-1},W_{t-1:t}]$ via a prototype re-balancing phase (Section \ref{['sec:proto-balance']}). (d) Before training on the next task, the new EFM and the prototypes of the current task classes are computed.
  • Figure 3: The regularizing effects of $E_t$ on the Cold Start CIFAR-100 - 10 and 20 step scenarios (see Section \ref{['sec:experimental_setting']} for details on dataset settings). Left: Perturbing features in the principal directions of $E_1$ results in significant changes in classifier outputs (in blue), while perturbations in non-principal directions leave the outputs unchanged (in red). Middle: If we continue incremental learning up through task 3 and perturb features from all three tasks in the principal (solid lines) and non-principal (dashed lines) directions of $E_3$, we see that $E_3$ captures all important directions in feature space up through task 3. Right: At the end of training, we observe the same behavior: the last per-step accuracy (see Eq. \ref{['eq:metrics']}), representing the average accuracy over all tasks after the last training session, decreases only when perturbed in directions of $E_{10}$ or $E_{20}$ relevant for previous tasks in the 10-step and 20-step scenarios, respectively.
  • Figure 4: Accuracy after each incremental step on the Cold Start CIFAR-100 10-step scenario. Left: In EFC, which combines EFM regularization with the asymmetric PR-ACE loss to balance current task data with prototypes during training, older tasks are forgotten more quickly than more recent ones. Right: EFC++, which applies EFM regularization during backbone training and a Prototype Re-balancing phase post-training, achieves a better plasticity-stability trade-off.
  • Figure 5: Average drift of the class means in the relevant directions of the EFM before and after training the task in which they are involved, in both EFC and EFC++ on CIFAR-100 (CS) 10-step. EFC++ consistently exhibits less drift than EFC, especially in the initial tasks, where the drift of the classes is more pronounced (double) for EFC. This confirms that EFC++ better controls the drift along relevant directions.
  • ...and 12 more figures