Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He, Fengqing Zhu
TL;DR
This work addresses Class-Incremental Learning under real-world long-tailed distributions by identifying dual imbalances: intra-phase within each task and inter-phase between old exemplars and new data. It introduces a gradient reweighting framework that adaptively balances per-class gradient contributions in the fully connected layer, coupled with a regularized softmax to avoid logit drift. To mitigate imbalanced forgetting across phases, it proposes Distribution-Aware Knowledge Distillation (DAKD) and a Decoupled Gradient Reweighting (DGR) scheme that separately handles plasticity and stability, with an attenuation mechanism to favor learning on new classes as data accumulates. Experimental results on CIFAR-100-LT, ImageNetSubset-LT, and Food101-LT under both LFS and LFH protocols show consistent improvements over state-of-the-art methods, demonstrating improved robustness and effectiveness for real-world CIL scenarios.
Abstract
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data while retaining learned knowledge. A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution, which introduces a dual imbalance problem involving (i) disparities between stored exemplars of old tasks and new class data (inter-phase imbalance), and (ii) severe class imbalances within each individual task (intra-phase imbalance). We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL. Our method addresses it by reweighting the gradients towards balanced optimization and unbiased classifier learning. Additionally, we observe imbalanced forgetting where paradoxically the instance-rich classes suffer higher performance degradation during CIL due to a larger amount of training data becoming unavailable in subsequent learning phases. To tackle this, we further introduce a distribution-aware knowledge distillation loss to mitigate forgetting by aligning output logits proportionally with the distribution of lost training data. We validate our method on CIFAR-100, ImageNetSubset, and Food101 across various evaluation protocols and demonstrate consistent improvements compared to existing works, showing great potential to apply CIL in real-world scenarios with enhanced robustness and effectiveness.
