Long-Tailed Recognition via Information-Preservable Two-Stage Learning
Fudong Lin, Xu Yuan
TL;DR
This work tackles long-tailed recognition by introducing a two-stage learning framework that first builds high-quality, well-separated feature spaces via Balanced Negative Sampling (BNS), which maximizes mutual information between augmented views and is theoretically tied to minimizing intra-class distance. In the second stage, Information-Preservable Determinantal Point Process (IP-DPP) samples balanced, information-rich subsets using an L-ensemble DPP construction, prioritizing instances with high information content while maintaining diversity. The approach achieves state-of-the-art results across CIFAR-10/100-LT, ImageNet-LT, and iNaturalist 2018, with strong tail performance and competitive overall accuracy, supported by linear probing and ablation studies. By preserving valuable information through IP-DPP and enriching representations through BNS, the method offers a robust, generalizable solution to majority bias in imbalanced data, applicable across architectures and scales.
Abstract
The imbalance (or long-tail) is the nature of many real-world data distributions, which often induces the undesirable bias of deep classification models toward frequent classes, resulting in poor performance for tail classes. In this paper, we propose a novel two-stage learning approach to mitigate such a majority-biased tendency while preserving valuable information within datasets. Specifically, the first stage proposes a new representation learning technique from the information theory perspective. This approach is theoretically equivalent to minimizing intra-class distance, yielding an effective and well-separated feature space. The second stage develops a novel sampling strategy that selects mathematically informative instances, able to rectify majority-biased decision boundaries without compromising a model's overall performance. As a result, our approach achieves the state-of-the-art performance across various long-tailed benchmark datasets, validated via extensive experiments. Our code is available at https://github.com/fudong03/BNS_IPDPP.
