Table of Contents
Fetching ...

Class Balance Matters to Active Class-Incremental Learning

Zitong Huang, Ze Chen, Yuanze Li, Bowen Dong, Erjin Zhou, Yong Liu, Rick Siow Mong Goh, Chun-Mei Feng, Wangmeng Zuo

TL;DR

The paper tackles Active Class-Incremental Learning (ACIL) by addressing the tendency of traditional active learning to produce class-imbalanced labeled sets that degrade incremental learning. It introduces Class-Balanced Selection (CBS), a clustering-based, KL-divergence-guided greedy sampling strategy that aligns the distribution of selected samples with the unlabeled pool while preserving informativeness, and demonstrates its plug-and-play compatibility with pretrained-model–based CIL methods using prompt tuning (e.g., L2P, DualPrompt, LP-DiF). CBS consistently outperforms random sampling and existing active-learning baselines across five datasets under varying labeling budgets, and gains further when combined with LP-DiF’s unlabeled-data replay mechanism. The work shows that balancing class representation in the annotated pool is crucial for high-quality incremental learning, offering a practical approach to reduce labeling costs while maintaining strong performance in dynamic, multi-session settings.

Abstract

Few-Shot Class-Incremental Learning has shown remarkable efficacy in efficient learning new concepts with limited annotations. Nevertheless, the heuristic few-shot annotations may not always cover the most informative samples, which largely restricts the capability of incremental learner. We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning. Based on this premise, this paper introduces the Active Class-Incremental Learning (ACIL). The objective of ACIL is to select the most informative samples from the unlabeled pool to effectively train an incremental learner, aiming to maximize the performance of the resulting model. Note that vanilla active learning algorithms suffer from class-imbalanced distribution among annotated samples, which restricts the ability of incremental learning. To achieve both class balance and informativeness in chosen samples, we propose Class-Balanced Selection (CBS) strategy. Specifically, we first cluster the features of all unlabeled images into multiple groups. Then for each cluster, we employ greedy selection strategy to ensure that the Gaussian distribution of the sampled features closely matches the Gaussian distribution of all unlabeled features within the cluster. Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique. Extensive experiments under ACIL protocol across five diverse datasets demonstrate that CBS outperforms both random selection and other SOTA active learning approaches. Code is publicly available at https://github.com/1170300714/CBS.

Class Balance Matters to Active Class-Incremental Learning

TL;DR

The paper tackles Active Class-Incremental Learning (ACIL) by addressing the tendency of traditional active learning to produce class-imbalanced labeled sets that degrade incremental learning. It introduces Class-Balanced Selection (CBS), a clustering-based, KL-divergence-guided greedy sampling strategy that aligns the distribution of selected samples with the unlabeled pool while preserving informativeness, and demonstrates its plug-and-play compatibility with pretrained-model–based CIL methods using prompt tuning (e.g., L2P, DualPrompt, LP-DiF). CBS consistently outperforms random sampling and existing active-learning baselines across five datasets under varying labeling budgets, and gains further when combined with LP-DiF’s unlabeled-data replay mechanism. The work shows that balancing class representation in the annotated pool is crucial for high-quality incremental learning, offering a practical approach to reduce labeling costs while maintaining strong performance in dynamic, multi-session settings.

Abstract

Few-Shot Class-Incremental Learning has shown remarkable efficacy in efficient learning new concepts with limited annotations. Nevertheless, the heuristic few-shot annotations may not always cover the most informative samples, which largely restricts the capability of incremental learner. We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning. Based on this premise, this paper introduces the Active Class-Incremental Learning (ACIL). The objective of ACIL is to select the most informative samples from the unlabeled pool to effectively train an incremental learner, aiming to maximize the performance of the resulting model. Note that vanilla active learning algorithms suffer from class-imbalanced distribution among annotated samples, which restricts the ability of incremental learning. To achieve both class balance and informativeness in chosen samples, we propose Class-Balanced Selection (CBS) strategy. Specifically, we first cluster the features of all unlabeled images into multiple groups. Then for each cluster, we employ greedy selection strategy to ensure that the Gaussian distribution of the sampled features closely matches the Gaussian distribution of all unlabeled features within the cluster. Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique. Extensive experiments under ACIL protocol across five diverse datasets demonstrate that CBS outperforms both random selection and other SOTA active learning approaches. Code is publicly available at https://github.com/1170300714/CBS.

Paper Structure

This paper contains 19 sections, 4 equations, 9 figures, 4 tables, 3 algorithms.

Figures (9)

  • Figure 1: Analysis of applying various active learning approaches to LP-DiF huang2024learning on CUB-200 under ACIL protocol (see Sec. \ref{['ref:exp_set']}). (a) to (f) show the the class distribution (first 100 classes of CUB-200) of samples selected by different active learning approaches and (g) compares their corresponding performance on the test set. Clearly, the samples selected by existing active learning methods (i.e., (b) to (e)) exhibit more severe class imbalance compared to random selection (i.e., (a)), which leads to that their corresponding performance is worse than random sampling. However, our proposed CBS (i.e., (f)) can achieve more class-balanced sampling, thereby outperforming both random sampling and existing active learning methods.
  • Figure 2: Avg curves of our CBS and comparison with counterparts applied to LP-DiF on five datasets (i.e., (b) to (f)) under various labeling budget $B$. (a) shows the mean Avg curves over five datasets.
  • Figure 3: Avg curves of our CBS and comparison with counterparts applied to L2P wang2022learning on five datasets (i.e., (b) to (f)) under various labeling budget $B$. (a) shows the mean Avg curves over five datasets.
  • Figure 4: Avg curves of our CBS and comparison with counterparts applied to DualPrompt wang2022dualprompt on five datasets (i.e., (b) to (f)) under various labeling budget $B$. (a) shows the mean Avg curves over five datasets.
  • Figure 5: Comparison of CBS and other counterparts applied to LP-DiF in terms of "class-imbalanced ratio" on CUB-200 under various labeling budget. Each curve represents a specific active learning method, and each point on the curve indicates the class-imbalanced ratio of this method at the corresponding session. The "class-imbalanced ratio" is calculated by dividing the number of samples of the class with the most samples selected by the active learning method in that session by the number of samples of the class with the fewest samples.
  • ...and 4 more figures