Table of Contents
Fetching ...

Rediscovering BCE Loss for Uniform Classification

Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, Jinming Duan

TL;DR

This work introduces uniform classification, where a single, dataset-wide threshold distinguishes positive from negative class-metrics, and defines corresponding metrics for SW, CW, and Uni classifications. It derives a unified-threshold BCE loss $L_{bce-u}$ (and a diverse-threshold variant $L_{bce-d}$) that embeds a learnable bias to realize the unified threshold during training, linking bias behavior to feature uniformity. Through extensive experiments on six datasets with three backbones, BCE-based losses notably improve uniform classification accuracy and often enhance sample-wise accuracy relative to SoftMax baselines, with $L_{bce-u}$ achieving particularly strong performance and biases aligning with the learned threshold $t^*$. The results demonstrate that BCE losses yield more uniform, intra-class compact, and inter-class distinctive features, improving open-set tasks such as face recognition, and reveal the critical role of bias and normalization in achieving robust uniform classification.

Abstract

This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.

Rediscovering BCE Loss for Uniform Classification

TL;DR

This work introduces uniform classification, where a single, dataset-wide threshold distinguishes positive from negative class-metrics, and defines corresponding metrics for SW, CW, and Uni classifications. It derives a unified-threshold BCE loss (and a diverse-threshold variant ) that embeds a learnable bias to realize the unified threshold during training, linking bias behavior to feature uniformity. Through extensive experiments on six datasets with three backbones, BCE-based losses notably improve uniform classification accuracy and often enhance sample-wise accuracy relative to SoftMax baselines, with achieving particularly strong performance and biases aligning with the learned threshold . The results demonstrate that BCE losses yield more uniform, intra-class compact, and inter-class distinctive features, improving open-set tasks such as face recognition, and reveal the critical role of bias and normalization in achieving robust uniform classification.

Abstract

This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.
Paper Structure (27 sections, 1 theorem, 42 equations, 3 figures, 18 tables)

This paper contains 27 sections, 1 theorem, 42 equations, 3 figures, 18 tables.

Key Result

Corollary 1

For the dataset $\mathcal{D}=\bigcup_{i=1}^N\mathcal{D}_i$ with the model $\mathcal{M}$ and classifier $\mathcal{C}=\{c_i(\theta_i;\cdot)\}$, we suppose that $c_i(\theta_i;\cdot)$ has lower bound $A$ and upper bound $B$ ($B>A$) for $\forall~i$, and If the model and classifier are perfectly trained on $\mathcal{D}$ using $\mathcal{L}_{\text{bce-u}}(\bm X^{(i)})$, i.e., then the final learned bias

Figures (3)

  • Figure 1: The visual comparison of performance of ResNet50 trained by $L_{\text{soft-nu}}$ and $L_{\text{bce-nu}}$ with various $\gamma$ on ImageNet-1K. Although $L_{\text{bce-nu}}$ performs poorly when $\gamma$ is too small or too large, for $\gamma$ varying in $[32,192]$, its uniform accuracy is much higher than $L_{\text{soft-nu}}$, and its sample-wise accuracy is slightly higher as well.
  • Figure 2: The distributions of positive and negative classification metrics of ResNet50 trained by $L_{\text{soft-nu}}$ (left) and $L_{\text{bce-nu}}$ (right) on ImageNet-1K. The smaller overlap between the positive and negative metrics of $L_{\text{bce-u}}$ and $L_{\text{bce-nu}}$ indicates that the BCE losses are more suitable for uniform classification compared to the SoftMax losses, and the their final learned biases (the yellow lines) are closer to the corresponding unified thresholds.
  • Figure 3: The distributions of positive and negative classification metrics of ResNet50 trained by $L_{\text{soft-nd}}$ (left) and $L_{\text{bce-nd}}$ (right) on CUB with bias initialization 3 (top) and 7 (bottom). The bias learned by $L_{\text{soft-nd}}$ cannot distinguish the positive and negative metrics, indicating it unsuitable for (class-wise) uniform classification. Despite $L_{\text{bce-nd}}$ is not suitable for them either, its learned biases (the yellow curve) effectively distinguishes the positive and type II negative metrics, and, integrated with these biases, it presents more uniformity.

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Corollary 1