Table of Contents
Fetching ...

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He, Fengqing Zhu

TL;DR

This work addresses Class-Incremental Learning under real-world long-tailed distributions by identifying dual imbalances: intra-phase within each task and inter-phase between old exemplars and new data. It introduces a gradient reweighting framework that adaptively balances per-class gradient contributions in the fully connected layer, coupled with a regularized softmax to avoid logit drift. To mitigate imbalanced forgetting across phases, it proposes Distribution-Aware Knowledge Distillation (DAKD) and a Decoupled Gradient Reweighting (DGR) scheme that separately handles plasticity and stability, with an attenuation mechanism to favor learning on new classes as data accumulates. Experimental results on CIFAR-100-LT, ImageNetSubset-LT, and Food101-LT under both LFS and LFH protocols show consistent improvements over state-of-the-art methods, demonstrating improved robustness and effectiveness for real-world CIL scenarios.

Abstract

Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data while retaining learned knowledge. A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution, which introduces a dual imbalance problem involving (i) disparities between stored exemplars of old tasks and new class data (inter-phase imbalance), and (ii) severe class imbalances within each individual task (intra-phase imbalance). We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL. Our method addresses it by reweighting the gradients towards balanced optimization and unbiased classifier learning. Additionally, we observe imbalanced forgetting where paradoxically the instance-rich classes suffer higher performance degradation during CIL due to a larger amount of training data becoming unavailable in subsequent learning phases. To tackle this, we further introduce a distribution-aware knowledge distillation loss to mitigate forgetting by aligning output logits proportionally with the distribution of lost training data. We validate our method on CIFAR-100, ImageNetSubset, and Food101 across various evaluation protocols and demonstrate consistent improvements compared to existing works, showing great potential to apply CIL in real-world scenarios with enhanced robustness and effectiveness.

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

TL;DR

This work addresses Class-Incremental Learning under real-world long-tailed distributions by identifying dual imbalances: intra-phase within each task and inter-phase between old exemplars and new data. It introduces a gradient reweighting framework that adaptively balances per-class gradient contributions in the fully connected layer, coupled with a regularized softmax to avoid logit drift. To mitigate imbalanced forgetting across phases, it proposes Distribution-Aware Knowledge Distillation (DAKD) and a Decoupled Gradient Reweighting (DGR) scheme that separately handles plasticity and stability, with an attenuation mechanism to favor learning on new classes as data accumulates. Experimental results on CIFAR-100-LT, ImageNetSubset-LT, and Food101-LT under both LFS and LFH protocols show consistent improvements over state-of-the-art methods, demonstrating improved robustness and effectiveness for real-world CIL scenarios.

Abstract

Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data while retaining learned knowledge. A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution, which introduces a dual imbalance problem involving (i) disparities between stored exemplars of old tasks and new class data (inter-phase imbalance), and (ii) severe class imbalances within each individual task (intra-phase imbalance). We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL. Our method addresses it by reweighting the gradients towards balanced optimization and unbiased classifier learning. Additionally, we observe imbalanced forgetting where paradoxically the instance-rich classes suffer higher performance degradation during CIL due to a larger amount of training data becoming unavailable in subsequent learning phases. To tackle this, we further introduce a distribution-aware knowledge distillation loss to mitigate forgetting by aligning output logits proportionally with the distribution of lost training data. We validate our method on CIFAR-100, ImageNetSubset, and Food101 across various evaluation protocols and demonstrate consistent improvements compared to existing works, showing great potential to apply CIL in real-world scenarios with enhanced robustness and effectiveness.
Paper Structure (16 sections, 12 equations, 6 figures, 2 tables)

This paper contains 16 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The illustration of imbalanced class-incremental with a dual imbalance issue including the intra-phase imbalance within each new task $\mathcal{T}$ and inter-phase imbalance between old tasks exemplars and new task training data. $\mathcal{M}^{t}$ refers to the model after learning the new task $\mathcal{T}^t$.
  • Figure 2: The average magnitudes of gradient $||\nabla_{\mathcal{L}_{ce}}(W^j)||$ for each class $j$ by incrementally learning 3 tasks $\mathcal{T}^1,\mathcal{T}^2,\mathcal{T}^3$ with cross-entropy and memory budget $n_{\varepsilon} = 20$ exemplars per class.
  • Figure 3: The overview of gradient reweighting under imbalanced CIL. Given the classifier $W$, the intra-phase gradient weighting is guided by scaling the gradient matrix $\nabla_{\mathcal{L}_{ce}}(W)$ with class balance ratios $\alpha$ derived from the cumulative gradients $\Phi$ over iterations. Concurrently, the inter-phase Decomposed Gradient Reweighting (DGR) balances the plasticity learning by separately adjusting gradients with class-balance ratios $\alpha$ and task-balance ratios $r$. Followed by tuning the stability-plasticity trade-off with a loss balance ratio $\beta$.
  • Figure 4: The classification accuracy (%) on test data belonging to all classes seen so far at each incremental step by varying the memory budget $n_\varepsilon \in \{10, 50\}$ on ImageNetSubset-LT and imbalance factor $\rho \in \{50, 150\}$ on Food101-LT.
  • Figure 5: The Forgetting GEM (%) at each incremental step by comparing our proposed DAKD with variants of distillation. The average classification accuracy (ACC) is shown in the legend ($\bullet$).
  • ...and 1 more figures