Table of Contents
Fetching ...

Exemplar-free Continual Representation Learning via Learnable Drift Compensation

Alex Gomez-Villa, Dipam Goswami, Kai Wang, Andrew D. Bagdanov, Bartlomiej Twardowski, Joost van de Weijer

TL;DR

This work tackles exemplar-free continual representation learning from scratch, where class prototypes drift as the backbone is updated, causing forgetting. It shows that much forgetting stems from prototype drift rather than loss of discriminative power and introduces Learnable Drift Compensation (LDC), a forward projector $p_F^{t}$ that maps old features to the new space, updating old prototypes via $P_t^c = p_F^{t}(P_{t-1}^c)$ without needing old data or labels. LDC is modular and compatible with supervised and self-supervised CL, enabling the first exemplar-free semi-supervised continual learning approach, and it achieves state-of-the-art results across CIFAR-100, Tiny-ImageNet, ImageNet100, and Stanford Cars with ViT variants. Empirical results, ablations, and comparisons against drift-correction baselines demonstrate that LDC effectively tracks prototype positions under moving backbones, offering a memory-efficient, plug-and-play solution with broad practical impact for continual learning tasks.

Abstract

Exemplar-free class-incremental learning using a backbone trained from scratch and starting from a small first task presents a significant challenge for continual representation learning. Prototype-based approaches, when continually updated, face the critical issue of semantic drift due to which the old class prototypes drift to different positions in the new feature space. Through an analysis of prototype-based continual learning, we show that forgetting is not due to diminished discriminative power of the feature extractor, and can potentially be corrected by drift compensation. To address this, we propose Learnable Drift Compensation (LDC), which can effectively mitigate drift in any moving backbone, whether supervised or unsupervised. LDC is fast and straightforward to integrate on top of existing continual learning approaches. Furthermore, we showcase how LDC can be applied in combination with self-supervised CL methods, resulting in the first exemplar-free semi-supervised continual learning approach. We achieve state-of-the-art performance in both supervised and semi-supervised settings across multiple datasets. Code is available at \url{https://github.com/alviur/ldc}.

Exemplar-free Continual Representation Learning via Learnable Drift Compensation

TL;DR

This work tackles exemplar-free continual representation learning from scratch, where class prototypes drift as the backbone is updated, causing forgetting. It shows that much forgetting stems from prototype drift rather than loss of discriminative power and introduces Learnable Drift Compensation (LDC), a forward projector that maps old features to the new space, updating old prototypes via without needing old data or labels. LDC is modular and compatible with supervised and self-supervised CL, enabling the first exemplar-free semi-supervised continual learning approach, and it achieves state-of-the-art results across CIFAR-100, Tiny-ImageNet, ImageNet100, and Stanford Cars with ViT variants. Empirical results, ablations, and comparisons against drift-correction baselines demonstrate that LDC effectively tracks prototype positions under moving backbones, offering a memory-efficient, plug-and-play solution with broad practical impact for continual learning tasks.

Abstract

Exemplar-free class-incremental learning using a backbone trained from scratch and starting from a small first task presents a significant challenge for continual representation learning. Prototype-based approaches, when continually updated, face the critical issue of semantic drift due to which the old class prototypes drift to different positions in the new feature space. Through an analysis of prototype-based continual learning, we show that forgetting is not due to diminished discriminative power of the feature extractor, and can potentially be corrected by drift compensation. To address this, we propose Learnable Drift Compensation (LDC), which can effectively mitigate drift in any moving backbone, whether supervised or unsupervised. LDC is fast and straightforward to integrate on top of existing continual learning approaches. Furthermore, we showcase how LDC can be applied in combination with self-supervised CL methods, resulting in the first exemplar-free semi-supervised continual learning approach. We achieve state-of-the-art performance in both supervised and semi-supervised settings across multiple datasets. Code is available at \url{https://github.com/alviur/ldc}.
Paper Structure (20 sections, 4 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Last task accuracy of class-prototype accumulation strategies using an NCM classifier in the 10-task CIFAR-100 scenario. The figure row shows regularization-based methods (a) LwF and (b) CaSSLe. Here we show result for two settings: Tasks 1&2 shows how the performance on the first two tasks evolves while incrementally training all tasks. Analogously, Tasks 1&2&3 shows how the performance on the first three tasks evolves over training of all ten tasks.
  • Figure 2: Feature drift estimation after applying random translations, rotations and scaling on three sample 2D distributions. We aim to estimate the true mean of $C_1$ at the end of $t_2$ using $D_{t_2}$. SDC assumes locally that transformations can be captured by translations. LDC can handle rotation and scaling in feature space. Note that LDC (cyan) more accurately approximates the real distribution mean (green).
  • Figure 3: Learnable Drift Compensation. We train the model on the current task data using regularization-based, self-supervised or supervised continual representation learning methods. After training the new feature extractor $f_t$, we learn a forward projector $p_F$ by minimizing the mean squared error between the projected features from $f_{t-1}$ and the features from $f_t$. We use the learned projector to compensate for the drift of the old class prototypes in the new feature space.
  • Figure 4: Comparison of LDC with NME for varying memory size in the supervised settings on CIFAR-100, Tiny-ImageNet and ImageNet100.
  • Figure 5: Drift compensation analysis using SDC and LDC in supervised settings on CIFAR-100. We compute the cosine distance between the updated and oracle prototypes. We plot the distributions of the cosine distances for all old classes using both methods after alternate tasks. While SDC fails to estimate the prototypes closer to the oracle (with increasing distance for many classes), LDC predictions are very close to the oracle prototypes (mean of distributions using LDC is close to 0 even after the last task and has low standard deviation).