Table of Contents
Fetching ...

A Unified Perspective for Loss-Oriented Imbalanced Learning via Localization

Zitai Wang, Qianqian Xu, Zhiyong Yang, Zhikang Xu, Linchao Zhang, Xiaochun Cao, Qingming Huang

TL;DR

This work tackles the challenge of imbalanced class distributions by moving beyond global analyses of loss-oriented methods to a per-class, localized framework. It develops localized calibration and local Lipschitz continuity concepts, enabling a fine-grained Fisher consistency and data-dependent generalization analysis for re-weighting and logit-adjustment schemes. The authors introduce the CVS loss and a principled two-stage learning algorithm that combines multiplicative and additive adjustments with deferred re-weighting, validated by extensive experiments on ResNets and foundation-models. The results demonstrate improved minority-class generalization and calibrated predictions, offering a unified, principled approach to loss-oriented imbalanced learning with practical applicability.

Abstract

Due to the inherent imbalance in real-world datasets, naïve Empirical Risk Minimization (ERM) tends to bias the learning process towards the majority classes, hindering generalization to minority classes. To rebalance the learning process, one straightforward yet effective approach is to modify the loss function via class-dependent terms, such as re-weighting and logit-adjustment. However, existing analysis of these loss-oriented methods remains coarse-grained and fragmented, failing to explain some empirical results. After reviewing prior work, we find that the properties used through their analysis are typically global, i.e., defined over the whole dataset. Hence, these properties fail to effectively capture how class-dependent terms influence the learning process. To bridge this gap, we turn to explore the localized versions of such properties i.e., defined within each class. Specifically, we employ localized calibration to provide consistency validation across a broader range of losses and localized Lipschitz continuity to provide a fine-grained generalization bound. In this way, we reach a unified perspective for improving and adjusting loss-oriented methods. Finally, a principled learning algorithm is developed based on these insights. Empirical results on both traditional ResNets and foundation models validate our theoretical analyses and demonstrate the effectiveness of the proposed method.

A Unified Perspective for Loss-Oriented Imbalanced Learning via Localization

TL;DR

This work tackles the challenge of imbalanced class distributions by moving beyond global analyses of loss-oriented methods to a per-class, localized framework. It develops localized calibration and local Lipschitz continuity concepts, enabling a fine-grained Fisher consistency and data-dependent generalization analysis for re-weighting and logit-adjustment schemes. The authors introduce the CVS loss and a principled two-stage learning algorithm that combines multiplicative and additive adjustments with deferred re-weighting, validated by extensive experiments on ResNets and foundation-models. The results demonstrate improved minority-class generalization and calibrated predictions, offering a unified, principled approach to loss-oriented imbalanced learning with practical applicability.

Abstract

Due to the inherent imbalance in real-world datasets, naïve Empirical Risk Minimization (ERM) tends to bias the learning process towards the majority classes, hindering generalization to minority classes. To rebalance the learning process, one straightforward yet effective approach is to modify the loss function via class-dependent terms, such as re-weighting and logit-adjustment. However, existing analysis of these loss-oriented methods remains coarse-grained and fragmented, failing to explain some empirical results. After reviewing prior work, we find that the properties used through their analysis are typically global, i.e., defined over the whole dataset. Hence, these properties fail to effectively capture how class-dependent terms influence the learning process. To bridge this gap, we turn to explore the localized versions of such properties i.e., defined within each class. Specifically, we employ localized calibration to provide consistency validation across a broader range of losses and localized Lipschitz continuity to provide a fine-grained generalization bound. In this way, we reach a unified perspective for improving and adjusting loss-oriented methods. Finally, a principled learning algorithm is developed based on these insights. Empirical results on both traditional ResNets and foundation models validate our theoretical analyses and demonstrate the effectiveness of the proposed method.
Paper Structure (35 sections, 16 theorems, 45 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 35 sections, 16 theorems, 45 equations, 12 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Under Asm.asm:global, the VS loss is Fisher consistent for any constants $\boldsymbol{\delta} \in \mathbb{R}_+^C$, if $\alpha_y = \delta_y / \pi_y, \boldsymbol{\beta} = \boldsymbol{1}, \Delta_y = \log \delta_y$.

Figures (12)

  • Figure 1: The calibration issue of minority classes: Although the mixup technique can significantly improve the overall/many/median calibration (a, b, c, respectively), the model still exhibits poor calibration in the few classes (d).
  • Figure 2: (a) Training accuracy of CE+DRW ($T_0 = 160$) and the CB loss w.r.t. training epoch. (b) $\widehat{Acc}_\text{min} / \widehat{Acc}_\text{maj}$w.r.t. the DRW epoch $T_0$, where $\widehat{Acc}_\text{min}$ and $\widehat{Acc}_\text{maj}$ denote the training accuracy of the best model on the minority/majority classes, respectively. (c) The test accuracy of the best model w.r.t. the DRW epoch $T_0$. We can find that the DRW scheme balances the training accuracy between the majority classes and the minority classes and thus improves the model performance on the test set, which is consistent with the theoretical insight (In2).
  • Figure 3: The empirical validation of Assumptions 2 and 3 under different settings. The x-axis denotes the training epochs, and the y-axis denotes the Expected Calibration Error (ECE).
  • Figure 4: The balanced accuracy of the CE loss and the LDAM loss w.r.t.$\alpha_y \propto \pi_y^{-\nu}$ on the CIFAR datasets, where the imbalance ratio $\rho = 100$. Both re-weighting and logit-adjustment boost the model performance, which is consistent with the theoretical insight (In1) and (In4-b).
  • Figure 5: Sensitivity analysis of VS+ADRW w.r.t.$\alpha_y \propto \pi_y^{-\nu}$ and $\Delta_y = \tau \log \pi_y$ on the CIFAR-10 dataset, where the imbalance ratio $\rho = 100$. Both re-weighting and logit-adjustment boost the model performance, which is consistent with the theoretical insights (In1) and (In4-b).
  • ...and 7 more figures

Theorems & Definitions (30)

  • Definition 1: Fisher Consistency
  • Proposition 1: DBLP:conf/iclr/MenonJRJVK21
  • Proposition 2
  • Proposition 3
  • Proposition 4: Union Bound for Imbalanced Learning DBLP:conf/nips/CaoWGAM19
  • Definition 2: Lipschitz Continuity
  • Lemma 1: Contraction Lemma
  • Lemma 2
  • Remark 1
  • Definition 3: Local Lipschitz Continuity
  • ...and 20 more