Rediscovering BCE Loss for Uniform Classification
Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, Jinming Duan
TL;DR
This work introduces uniform classification, where a single, dataset-wide threshold distinguishes positive from negative class-metrics, and defines corresponding metrics for SW, CW, and Uni classifications. It derives a unified-threshold BCE loss $L_{bce-u}$ (and a diverse-threshold variant $L_{bce-d}$) that embeds a learnable bias to realize the unified threshold during training, linking bias behavior to feature uniformity. Through extensive experiments on six datasets with three backbones, BCE-based losses notably improve uniform classification accuracy and often enhance sample-wise accuracy relative to SoftMax baselines, with $L_{bce-u}$ achieving particularly strong performance and biases aligning with the learned threshold $t^*$. The results demonstrate that BCE losses yield more uniform, intra-class compact, and inter-class distinctive features, improving open-set tasks such as face recognition, and reveal the critical role of bias and normalization in achieving robust uniform classification.
Abstract
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.
