FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning
Dipam Goswami, Yuyang Liu, Bartłomiej Twardowski, Joost van de Weijer
TL;DR
Exemplar-free continual learning remains challenging due to non-stationary, heterogeneous class distributions when the backbone is frozen. FeCAM reframes prototype-based classification as a Bayes detector using per-class covariance matrices and an anisotropic Mahalanobis distance, augmented by correlation normalization, covariance shrinkage, and Tukey transformations to stabilize estimates. This approach yields non-linear, data-driven decision boundaries that better separate old and new classes, outperforming linear and Gaussian-sampling baselines across many-shot and few-shot benchmarks, and extending to domain-incremental scenarios with pretrained vision transformers. FeCAM achieves state-of-the-art results without storing exemplars, offering strong practical impact for privacy-preserving, scalable continual learning systems.
Abstract
Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks and thus suffers from catastrophic forgetting. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. In this paper, we explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes. In an analysis of the feature distributions of classes, we show that classification based on Euclidean metrics is successful for jointly trained features. However, when learning from non-stationary data, we observe that the Euclidean metric is suboptimal and that feature distributions are heterogeneous. To address this challenge, we revisit the anisotropic Mahalanobis distance for CIL. In addition, we empirically show that modeling the feature covariance relations is better than previous attempts at sampling features from normal distributions and training a linear classifier. Unlike existing methods, our approach generalizes to both many- and few-shot CIL settings, as well as to domain-incremental settings. Interestingly, without updating the backbone network, our method obtains state-of-the-art results on several standard continual learning benchmarks. Code is available at https://github.com/dipamgoswami/FeCAM.
