Table of Contents
Fetching ...

Recommendation of data-free class-incremental learning algorithms by simulating future data

Eva Feillet, Adrian Popescu, Céline Hudelot

TL;DR

This work introduces an algorithm recommendation method that simulates the future data stream and leverages generative models to simulate future classes from the same visual domain and recommends the one which performs best in the user-defined incremental setting.

Abstract

Class-incremental learning deals with sequential data streams composed of batches of classes. Various algorithms have been proposed to address the challenging case where samples from past classes cannot be stored. However, selecting an appropriate algorithm for a user-defined setting is an open problem, as the relative performance of these algorithms depends on the incremental settings. To solve this problem, we introduce an algorithm recommendation method that simulates the future data stream. Given an initial set of classes, it leverages generative models to simulate future classes from the same visual domain. We evaluate recent algorithms on the simulated stream and recommend the one which performs best in the user-defined incremental setting. We illustrate the effectiveness of our method on three large datasets using six algorithms and six incremental settings. Our method outperforms competitive baselines, and performance is close to that of an oracle choosing the best algorithm in each setting. This work contributes to facilitate the practical deployment of incremental learning.

Recommendation of data-free class-incremental learning algorithms by simulating future data

TL;DR

This work introduces an algorithm recommendation method that simulates the future data stream and leverages generative models to simulate future classes from the same visual domain and recommends the one which performs best in the user-defined incremental setting.

Abstract

Class-incremental learning deals with sequential data streams composed of batches of classes. Various algorithms have been proposed to address the challenging case where samples from past classes cannot be stored. However, selecting an appropriate algorithm for a user-defined setting is an open problem, as the relative performance of these algorithms depends on the incremental settings. To solve this problem, we introduce an algorithm recommendation method that simulates the future data stream. Given an initial set of classes, it leverages generative models to simulate future classes from the same visual domain. We evaluate recent algorithms on the simulated stream and recommend the one which performs best in the user-defined incremental setting. We illustrate the effectiveness of our method on three large datasets using six algorithms and six incremental settings. Our method outperforms competitive baselines, and performance is close to that of an oracle choosing the best algorithm in each setting. This work contributes to facilitate the practical deployment of incremental learning.
Paper Structure (37 sections, 19 figures, 5 tables)

This paper contains 37 sections, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Overview of the proposed method. A user needs an algorithm for a given DFCIL use case. He has access to an initial labeled dataset $D_1$ and provides some expected characteristics of the incremental process, e.g. the number of classes per step. Based on these inputs, our method simulates a future data stream that extends $D_1$, first by proposing future class names, then by populating these classes with images. Next, it evaluates different DFCIL algorithms on the simulated dataset and recommends the one with the best performance for deployment with real data.
  • Figure 2: Average cosine distances between CLIP embeddings of new class names obtained using either Proxy21k (WordNet) or SimuGen (Llamav2) and the real class names of each dataset. Lower distances indicate a better fit between simulated and real class names.
  • Figure 3: Examples of images generated by Stable-Diffusion-v2-1-base with SimuGen (same prompt for each group, but different random seeds).
  • Figure 4: Detailed incremental accuracy for (a) iNat1k with $\text{Card}(P_1)=20$ and $T=50$ steps and (b) Land1k with $\text{Card}(P_1) = 100$ and $T = 10$ steps, and their corresponding simulated datasets obtained following SimuGen and Proxy21k.
  • Figure 5: Detailed view of the performance gap between the oracle and the recommendation methods after simulating $t = 1, 2, \dots T$ incremental steps, for scenarios of the form ($Card(P_1)$, T). Results are averaged over the three datasets (IN1K, iNat1k, Land1k).
  • ...and 14 more figures