Table of Contents
Fetching ...

FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning

Dipam Goswami, Yuyang Liu, Bartłomiej Twardowski, Joost van de Weijer

TL;DR

Exemplar-free continual learning remains challenging due to non-stationary, heterogeneous class distributions when the backbone is frozen. FeCAM reframes prototype-based classification as a Bayes detector using per-class covariance matrices and an anisotropic Mahalanobis distance, augmented by correlation normalization, covariance shrinkage, and Tukey transformations to stabilize estimates. This approach yields non-linear, data-driven decision boundaries that better separate old and new classes, outperforming linear and Gaussian-sampling baselines across many-shot and few-shot benchmarks, and extending to domain-incremental scenarios with pretrained vision transformers. FeCAM achieves state-of-the-art results without storing exemplars, offering strong practical impact for privacy-preserving, scalable continual learning systems.

Abstract

Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks and thus suffers from catastrophic forgetting. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. In this paper, we explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes. In an analysis of the feature distributions of classes, we show that classification based on Euclidean metrics is successful for jointly trained features. However, when learning from non-stationary data, we observe that the Euclidean metric is suboptimal and that feature distributions are heterogeneous. To address this challenge, we revisit the anisotropic Mahalanobis distance for CIL. In addition, we empirically show that modeling the feature covariance relations is better than previous attempts at sampling features from normal distributions and training a linear classifier. Unlike existing methods, our approach generalizes to both many- and few-shot CIL settings, as well as to domain-incremental settings. Interestingly, without updating the backbone network, our method obtains state-of-the-art results on several standard continual learning benchmarks. Code is available at https://github.com/dipamgoswami/FeCAM.

FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning

TL;DR

Exemplar-free continual learning remains challenging due to non-stationary, heterogeneous class distributions when the backbone is frozen. FeCAM reframes prototype-based classification as a Bayes detector using per-class covariance matrices and an anisotropic Mahalanobis distance, augmented by correlation normalization, covariance shrinkage, and Tukey transformations to stabilize estimates. This approach yields non-linear, data-driven decision boundaries that better separate old and new classes, outperforming linear and Gaussian-sampling baselines across many-shot and few-shot benchmarks, and extending to domain-incremental scenarios with pretrained vision transformers. FeCAM achieves state-of-the-art results without storing exemplars, offering strong practical impact for privacy-preserving, scalable continual learning systems.

Abstract

Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks and thus suffers from catastrophic forgetting. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. In this paper, we explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes. In an analysis of the feature distributions of classes, we show that classification based on Euclidean metrics is successful for jointly trained features. However, when learning from non-stationary data, we observe that the Euclidean metric is suboptimal and that feature distributions are heterogeneous. To address this challenge, we revisit the anisotropic Mahalanobis distance for CIL. In addition, we empirically show that modeling the feature covariance relations is better than previous attempts at sampling features from normal distributions and training a linear classifier. Unlike existing methods, our approach generalizes to both many- and few-shot CIL settings, as well as to domain-incremental settings. Interestingly, without updating the backbone network, our method obtains state-of-the-art results on several standard continual learning benchmarks. Code is available at https://github.com/dipamgoswami/FeCAM.
Paper Structure (19 sections, 13 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 19 sections, 13 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of feature representations in CIL settings. In Joint Training (a), deep neural networks learn good isotropic spherical representations guerriero2018deepncm and thus the Euclidean metric can be used effectively. However, it is challenging to learn isotropic representations of both old and new classes in CIL settings. When the model is too stable in (b), it is unable to learn good spherical representations of new classes and when it is too plastic in (c), it learns spherical representations of new classes but loses the spherical representations of old classes. Thus, it is suboptimal to use the isotropic euclidean distance. We propose FeCAM in (d) which models the feature covariance relations using Mahalanobis metric and learns better non-linear decision boundaries for new classes.
  • Figure 2: Illustration of distances (contour lines indicate points at equal distance from prototype).
  • Figure 3: (a) Singular values comparison for old and new classes, (b-c) Visualization of features for old classes and new classes by t-SNE, where the colors of points indicate the corresponding classes.
  • Figure 4: Accuracy Comparison of NCM (Euclidean) and FeCAM (Mahalanobis) using common covariance matrix and a matrix per class on CIFAR100 50-50 (2 task) sequence, for Old, New, and All classes at the end of the learning sequence.
  • Figure 5: Avg $\mathcal{A}$cc comparison of bayesian and linear classifier on CIFAR100 (T=5) setting.
  • ...and 4 more figures