Table of Contents
Fetching ...

Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong

TL;DR

This work tackles long-tailed recognition by revealing how imbalanced gradient magnitudes and inter-class negative-gradient distributions bias classifiers toward head classes. It introduces Gradient-Aware Logit Adjustment (GALA), which adds two gradient-informed margins to logits using accumulated positive gradients $\\theta_j$ and accumulated negative gradients $\\phi_k$ to balance optimization across classes. A simple post hoc prediction re-balancing strategy further mitigates residual head-class bias at inference. Across multiple LT benchmarks, GALA establishes strong improvements over prior methods (notably GCL), with additional gains when combined with prediction re-balancing, highlighting its practical impact for robust tail-class learning.

Abstract

In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients from different negative classes. Therefore, we propose the Gradient-Aware Logit Adjustment (GALA) loss, which adjusts the logits based on accumulated gradients to balance the optimization process. Additionally, We find that most of the solutions to long-tailed problems are still biased towards head classes in the end, and we propose a simple and post hoc prediction re-balancing strategy to further mitigate the basis toward head class. Extensive experiments are conducted on multiple popular long-tailed recognition benchmark datasets to evaluate the effectiveness of these two designs. Our approach achieves top-1 accuracy of 48.5\%, 41.4\%, and 73.3\% on CIFAR100-LT, Places-LT, and iNaturalist, outperforming the state-of-the-art method GCL by a significant margin of 3.62\%, 0.76\% and 1.2\%, respectively. Code is available at https://github.com/lt-project-repository/lt-project.

Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

TL;DR

This work tackles long-tailed recognition by revealing how imbalanced gradient magnitudes and inter-class negative-gradient distributions bias classifiers toward head classes. It introduces Gradient-Aware Logit Adjustment (GALA), which adds two gradient-informed margins to logits using accumulated positive gradients and accumulated negative gradients to balance optimization across classes. A simple post hoc prediction re-balancing strategy further mitigates residual head-class bias at inference. Across multiple LT benchmarks, GALA establishes strong improvements over prior methods (notably GCL), with additional gains when combined with prediction re-balancing, highlighting its practical impact for robust tail-class learning.

Abstract

In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients from different negative classes. Therefore, we propose the Gradient-Aware Logit Adjustment (GALA) loss, which adjusts the logits based on accumulated gradients to balance the optimization process. Additionally, We find that most of the solutions to long-tailed problems are still biased towards head classes in the end, and we propose a simple and post hoc prediction re-balancing strategy to further mitigate the basis toward head class. Extensive experiments are conducted on multiple popular long-tailed recognition benchmark datasets to evaluate the effectiveness of these two designs. Our approach achieves top-1 accuracy of 48.5\%, 41.4\%, and 73.3\% on CIFAR100-LT, Places-LT, and iNaturalist, outperforming the state-of-the-art method GCL by a significant margin of 3.62\%, 0.76\% and 1.2\%, respectively. Code is available at https://github.com/lt-project-repository/lt-project.
Paper Structure (8 sections, 9 equations, 2 figures, 2 tables)

This paper contains 8 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: (a) demonstrates the imbalanced weight norms of long-tailed biased classifiers. The full name of CAS Kang2020Decoupling is class aware sampler which is used to balance the number of samples in each class. Statistics are from experiments performed on ImageNet-LT. (b) reports the gradient ratio (the ratio of positive to negative gradients) of Cross Entropy, EQL Tan2020EqualizationLF, and our GALA loss. (c) reports the imbalanced negative gradients from different classes of Cross Entropy, EQL, and our GALA loss. (d) describes the average similarity between a tail class vector and features of its class when the model is trained with Cross Entropy and our GALA loss. Statistics for (b), (c), and (d) are from experiments performed on CIFAR100-LT.
  • Figure 2: The number of positive predictions (the number of samples is predicted as its class over all classes) of different methods.