BCE3S: Binary Cross-Entropy Based Tripartite Synergistic Learning for Long-tailed Recognition
Weijia Fan, Qiufu Li, Jiajun Wen, Xiaoyang Peng
TL;DR
The paper addresses long-tailed recognition by identifying limitations of cross-entropy in coupling class metrics and inducing imbalanced classifier vectors. It proposes BCE-based tripartite synergistic learning (BCE3S), combining BCE-based joint learning, BCE-based contrastive learning, and BCE-based uniform classifier separability learning to decouple metrics, improve feature compactness, and balance classifier separability. Empirical results across CIFAR-LT, ImageNet-LT, and iNaturalist2018 show BCE3S achieves state-of-the-art performance and more uniform feature and classifier properties, with notable gains in tail-class accuracy and overall robustness. The approach demonstrates strong practical impact by delivering superior long-tailed recognition without prohibitive computational overhead and is compatible with existing re-balancing techniques.
Abstract
For long-tailed recognition (LTR) tasks, high intra-class compactness and inter-class separability in both head and tail classes, as well as balanced separability among all the classifier vectors, are preferred. The existing LTR methods based on cross-entropy (CE) loss not only struggle to learn features with desirable properties but also couple imbalanced classifier vectors in the denominator of its Softmax, amplifying the imbalance effects in LTR. In this paper, for the LTR, we propose a binary cross-entropy (BCE)-based tripartite synergistic learning, termed BCE3S, which consists of three components: (1) BCE-based joint learning optimizes both the classifier and sample features, which achieves better compactness and separability among features than the CE-based joint learning, by decoupling the metrics between feature and the imbalanced classifier vectors in multiple Sigmoid; (2) BCE-based contrastive learning further improves the intra-class compactness of features; (3) BCE-based uniform learning balances the separability among classifier vectors and interactively enhances the feature properties by combining with the joint learning. The extensive experiments show that the LTR model trained by BCE3S not only achieves higher compactness and separability among sample features, but also balances the classifier's separability, achieving SOTA performance on various long-tailed datasets such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist2018.
