Long-tail learning via logit adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar
TL;DR
This work tackles the challenge of long-tail label distributions by introducing logit-adjustment mechanisms for softmax classification. It provides a theoretical framing that ties logit adjustments to Bayes-optimal decision rules under the balanced error, and offers two practical realizations: a post-hoc logit translation and a logit-adjusted loss that incorporate class priors during training. The approach unifies and improves upon prior post-hoc and margin-based methods, demonstrating Fisher consistency for BER and strong empirical gains on synthetic and real-world long-tailed datasets, especially for rare classes. Collectively, the methods yield robust, principled improvements over standard training in imbalanced settings and offer scalable options for practitioners.
Abstract
Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these challenges. Our techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training. Such adjustment encourages a large relative margin between logits of rare versus dominant labels. These techniques unify and generalise several recent proposals in the literature, while possessing firmer statistical grounding and empirical performance.
