Diffusion Model Meets Non-Exemplar Class-Incremental Learning and Beyond

Jichuan Zhang; Yali Li; Xin Liu; Shengjin Wang

Diffusion Model Meets Non-Exemplar Class-Incremental Learning and Beyond

Jichuan Zhang, Yali Li, Xin Liu, Shengjin Wang

TL;DR

This work tackles non-exemplar class-incremental learning by addressing the representation shift and memory constraints that arise when old exemplars are unavailable. It introduces DiffFR, a three-component approach that combines similarity-driven self-supervision for a generalizable feature extractor, a compact diffusion model to generate class-conditioned old-feature replay, and prototype calibration to focus diffusion on the shape of feature distributions rather than exact prototypes, while keeping the feature extractor frozen. Empirical results on CIFAR-100, TinyImageNet, and ImageNet-Subset show DiffFR achieving state-of-the-art average incremental accuracy gains (e.g., improvements around $2$–$4$ percentage points) and reduced forgetting, with additional gains under enhanced data augmentation and in domain-incremental settings. The method is memory-efficient and exemplar-free, and demonstrates strong generalization to new classes and domain shifts, signaling practical impact for continual learning in privacy-conscious or data-limited scenarios, with code to be released.

Abstract

Non-exemplar class-incremental learning (NECIL) is to resist catastrophic forgetting without saving old class samples. Prior methodologies generally employ simple rules to generate features for replaying, suffering from large distribution gap between replayed features and real ones. To address the aforementioned issue, we propose a simple, yet effective \textbf{Diff}usion-based \textbf{F}eature \textbf{R}eplay (\textbf{DiffFR}) method for NECIL. First, to alleviate the limited representational capacity caused by fixing the feature extractor, we employ Siamese-based self-supervised learning for initial generalizable features. Second, we devise diffusion models to generate class-representative features highly similar to real features, which provides an effective way for exemplar-free knowledge memorization. Third, we introduce prototype calibration to direct the diffusion model's focus towards learning the distribution shapes of features, rather than the entire distribution. Extensive experiments on public datasets demonstrate significant performance gains of our DiffFR, outperforming the state-of-the-art NECIL methods by 3.0\% in average. The code will be made publicly available soon.

Diffusion Model Meets Non-Exemplar Class-Incremental Learning and Beyond

TL;DR

–

percentage points) and reduced forgetting, with additional gains under enhanced data augmentation and in domain-incremental settings. The method is memory-efficient and exemplar-free, and demonstrates strong generalization to new classes and domain shifts, signaling practical impact for continual learning in privacy-conscious or data-limited scenarios, with code to be released.

Abstract

Paper Structure (25 sections, 10 equations, 8 figures, 9 tables)

This paper contains 25 sections, 10 equations, 8 figures, 9 tables.

Introduction
Related Work
Methodology
Self-Supervision for Generalizable Features
Diffusion-based Feature Replay
Prototype Calibration
Experiments
Experimental Setup
Quantitative Results
Comparative Study
Ablation Study
Conclusion
Supplementary Material
More Details for Class-Incremental Learning (CIL)
Formulation of Evaluation Metrics
...and 10 more sections

Figures (8)

Figure 1: Comparisons of three feature replay ways. a) and b) represent methods exemplified by PASS and FeTrIL, respectively. c) represent our method (DiffFR). Compared to a) and b), we replay features with high similarity to real features, thus improving the performance of NECIL. Besides,our approach does not suffer from representation shift or the inability to generalize.
Figure 2: Illustration of DiffFR for NECIL. Classes of the initial task are augmented by rotation transformation, thus resulting the difference between the classifier in the initial training and that in the incremental training.
Figure 3: Illustration of prototype calibration. To learn the distribution of real features, we first normalize them by class. Subsequently, we use the diffusion model to learn the distribution of normalized features. Finally, denormalization by class is applied to samples to generate features with accurate prototypes.
Figure 4: Evolution of top-1 accuracy on three datasets with T = 10 phases.
Figure 5: (a) Average accuracy of total/old/new classes at each phase. For better visualization, we use the setting of CIFAR-100,$T=5$ as an example. (b) Visualization of real and generated features on CIFAR-100 (10 phases). Each class has 500 real features, and we generate 5000 features to facilitate more accurate comparison between the distribution shape of real features and generated features.
...and 3 more figures

Diffusion Model Meets Non-Exemplar Class-Incremental Learning and Beyond

TL;DR

Abstract

Diffusion Model Meets Non-Exemplar Class-Incremental Learning and Beyond

Authors

TL;DR

Abstract

Table of Contents

Figures (8)