Table of Contents
Fetching ...

BCE3S: Binary Cross-Entropy Based Tripartite Synergistic Learning for Long-tailed Recognition

Weijia Fan, Qiufu Li, Jiajun Wen, Xiaoyang Peng

TL;DR

The paper addresses long-tailed recognition by identifying limitations of cross-entropy in coupling class metrics and inducing imbalanced classifier vectors. It proposes BCE-based tripartite synergistic learning (BCE3S), combining BCE-based joint learning, BCE-based contrastive learning, and BCE-based uniform classifier separability learning to decouple metrics, improve feature compactness, and balance classifier separability. Empirical results across CIFAR-LT, ImageNet-LT, and iNaturalist2018 show BCE3S achieves state-of-the-art performance and more uniform feature and classifier properties, with notable gains in tail-class accuracy and overall robustness. The approach demonstrates strong practical impact by delivering superior long-tailed recognition without prohibitive computational overhead and is compatible with existing re-balancing techniques.

Abstract

For long-tailed recognition (LTR) tasks, high intra-class compactness and inter-class separability in both head and tail classes, as well as balanced separability among all the classifier vectors, are preferred. The existing LTR methods based on cross-entropy (CE) loss not only struggle to learn features with desirable properties but also couple imbalanced classifier vectors in the denominator of its Softmax, amplifying the imbalance effects in LTR. In this paper, for the LTR, we propose a binary cross-entropy (BCE)-based tripartite synergistic learning, termed BCE3S, which consists of three components: (1) BCE-based joint learning optimizes both the classifier and sample features, which achieves better compactness and separability among features than the CE-based joint learning, by decoupling the metrics between feature and the imbalanced classifier vectors in multiple Sigmoid; (2) BCE-based contrastive learning further improves the intra-class compactness of features; (3) BCE-based uniform learning balances the separability among classifier vectors and interactively enhances the feature properties by combining with the joint learning. The extensive experiments show that the LTR model trained by BCE3S not only achieves higher compactness and separability among sample features, but also balances the classifier's separability, achieving SOTA performance on various long-tailed datasets such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist2018.

BCE3S: Binary Cross-Entropy Based Tripartite Synergistic Learning for Long-tailed Recognition

TL;DR

The paper addresses long-tailed recognition by identifying limitations of cross-entropy in coupling class metrics and inducing imbalanced classifier vectors. It proposes BCE-based tripartite synergistic learning (BCE3S), combining BCE-based joint learning, BCE-based contrastive learning, and BCE-based uniform classifier separability learning to decouple metrics, improve feature compactness, and balance classifier separability. Empirical results across CIFAR-LT, ImageNet-LT, and iNaturalist2018 show BCE3S achieves state-of-the-art performance and more uniform feature and classifier properties, with notable gains in tail-class accuracy and overall robustness. The approach demonstrates strong practical impact by delivering superior long-tailed recognition without prohibitive computational overhead and is compatible with existing re-balancing techniques.

Abstract

For long-tailed recognition (LTR) tasks, high intra-class compactness and inter-class separability in both head and tail classes, as well as balanced separability among all the classifier vectors, are preferred. The existing LTR methods based on cross-entropy (CE) loss not only struggle to learn features with desirable properties but also couple imbalanced classifier vectors in the denominator of its Softmax, amplifying the imbalance effects in LTR. In this paper, for the LTR, we propose a binary cross-entropy (BCE)-based tripartite synergistic learning, termed BCE3S, which consists of three components: (1) BCE-based joint learning optimizes both the classifier and sample features, which achieves better compactness and separability among features than the CE-based joint learning, by decoupling the metrics between feature and the imbalanced classifier vectors in multiple Sigmoid; (2) BCE-based contrastive learning further improves the intra-class compactness of features; (3) BCE-based uniform learning balances the separability among classifier vectors and interactively enhances the feature properties by combining with the joint learning. The extensive experiments show that the LTR model trained by BCE3S not only achieves higher compactness and separability among sample features, but also balances the classifier's separability, achieving SOTA performance on various long-tailed datasets such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist2018.

Paper Structure

This paper contains 13 sections, 29 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Pipeline of BCE3S, integrating all the three learning modes, i.e., joint learning (Eq. (\ref{['eq:sample_to_class_BCE']})) between sample feature and classifier, contrastive learning (Eq. (\ref{['eq:sample_to_sample_BCE']})) among features, and classifier's uniform separability learning (Eq. (\ref{['eq:class_to_class_learning']})).
  • Figure 2: The intra-class compactness (top), inter-class separability (middle) of sample features, and separability (bottom) of classifier vectors on the training set of CIFAR100-LT (IF $=100$), with the model trained using different CE- (left) and BCE-based (right) methods.
  • Figure 3: Feature distribution on the CIFAR10-LT test set with CE (top) and BCE (bottom) learning methods. Compared to CE methods, features extracted using BCE-based joint learning $L_{\text{bce}}^{\text{(sc)}}$ show improved intra-class compactness and inter-class separability. The contrastive learning $L_{\text{bce}}^{\text{(ss)}}$ and uniform learning $L_{\text{bce}}^{\text{(cc)}}$ further enhance these properties.
  • Figure 4: Dataset descriptions for different long-tailed datasets.
  • Figure 5: Parameter study for $L_{\text{bce}}^{\text{(sc)}}$ (top left), $L_{\text{bce}}^{\text{(ss)}}$ (bottom) and $L_{\text{bce}}^{\text{(cc)}}$ (top right), on CIFAR100-LT with IF $=100$.
  • ...and 5 more figures