Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Dipam Goswami; Albin Soutif--Cormerais; Yuyang Liu; Sandesh Kamath; Bartłomiej Twardowski; Joost van de Weijer

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Dipam Goswami, Albin Soutif--Cormerais, Yuyang Liu, Sandesh Kamath, Bartłomiej Twardowski, Joost van de Weijer

TL;DR

This work tackles exemplar-free continual learning (EFCIL) under a small-start setting where no exemplars from old tasks can be stored. It introduces Adversarial Drift Compensation (ADC), which perturbs current-task samples toward old class prototypes in the old feature space to accurately estimate prototype drift and compensate it in the new space, leveraging continual adversarial transferability. The approach uses a simple Nearest-Class Mean (NCM) classifier with a distillation-based training objective and demonstrates state-of-the-art performance across CIFAR-100, TinyImageNet, ImageNet-Subset, and fine-grained datasets like CUB-200 and Stanford Cars, while incurring modest computational overhead. The results suggest that adversarially generated pseudo-exemplars can effectively resurrect old class representations without violating exemplar-free constraints, providing a practical and scalable solution for continual learning in privacy- or regulation-sensitive contexts.

Abstract

Continual learning methods are known to suffer from catastrophic forgetting, a phenomenon that is particularly hard to counter for methods that do not store exemplars of previous tasks. Therefore, to reduce potential drift in the feature extractor, existing exemplar-free methods are typically evaluated in settings where the first task is significantly larger than subsequent tasks. Their performance drops drastically in more challenging settings starting with a smaller first task. To address this problem of feature drift estimation for exemplar-free methods, we propose to adversarially perturb the current samples such that their embeddings are close to the old class prototypes in the old model embedding space. We then estimate the drift in the embedding space from the old to the new model using the perturbed images and compensate the prototypes accordingly. We exploit the fact that adversarial samples are transferable from the old to the new feature space in a continual learning setting. The generation of these images is simple and computationally cheap. We demonstrate in our experiments that the proposed approach better tracks the movement of prototypes in embedding space and outperforms existing methods on several standard continual learning benchmarks as well as on fine-grained datasets. Code is available at https://github.com/dipamgoswami/ADC.

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

TL;DR

Abstract

Paper Structure (18 sections, 6 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 6 equations, 9 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Method
Motivation
Adversarial Drift Estimation
Drift Compensation
Training Strategy
Experiments
Quantitative Evaluation
Computational overhead of ADC
Ablation Studies
Conclusions
Acknowledgement.
Training settings and hyperparameters
Robustness to different class orders
...and 3 more sections

Figures (9)

Figure 1: Illustration of Adversarial Drift Compensation (ADC) and SDC Yu_2020_CVPR. In SDC, the drift $\Delta^k_{t-1 \xrightarrow{} t}$ is estimated as the average of drift of all new task samples after training on a new task. Instead, we propose to move the new task features close to the old prototype $P^k_{t-1}$ of class $k$ by perturbing the new images using targeted adversarial attacks. The drift of the adversarial samples from old to new feature space is used to resurrect all old prototypes.
Figure 2: Illustration to show that the cosine distance between embeddings and old prototype in the old feature space is correlated with the cosine distance between embeddings and oracle prototype in the new feature space. This holds true for embeddings of both initial and adversarial samples. For demonstration, we select few current-task samples that are closest to the old prototype and choose the same target old class for all samples. The blue and orange points represents the non-modified current class samples and the modified samples using our proposed approach respectively. In this analysis, we compute the oracle prototype using all old task data in the new feature space.
Figure 3: (a) Adversarial Sample Generation: On the old model feature space, the new samples closest to the old prototype are selected and iteratively perturbed in the direction of the target old prototype to generate adversarial samples which are now misclassified as the target old class resulting in embeddings closer to the old prototype. We perform this for every old class (we show 2 classes here for demonstration). (b) Model Training with Drift Compensation: The new model is trained using the classification loss for learning new classes and knowledge distillation loss to prevent forgetting of old classes. After the new model is trained, the adversarial samples generated using the old model are passed through both the models and the drift from old to new feature space is estimated. This is then used to update the old prototypes.
Figure 4: Memory Size vs accuracy comparison of NME and ADC on CIFAR-100 and TinyImageNet (T=10) settings.
Figure 5: Accuracy after each incremental task for CIFAR-100, TinyImageNet and CUB-200 datasets on 10 task settings. ADC improves over the compared methods starting from the initial to the last task.
...and 4 more figures

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

TL;DR

Abstract

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)