SCS-SupCon: Sigmoid-based Common and Style Supervised Contrastive Learning with Adaptive Decision Boundaries
Bin Wang, Fadi Dornaika
TL;DR
SCS-SupCon tackles the difficulty of fine-grained classification by replacing InfoNCE-style losses with a sigmoid-based pairwise contrastive loss that includes learnable decision boundaries, enabling adaptive differentiation between positives and negatives. It preserves CS-SupCon's explicit style-distance constraint to disentangle common (class-relevant) and style (intra-class variation) features, and uses a two-stage training scheme with a classifier trained on common features. Empirically, it achieves state-of-the-art results across six datasets and diverse backbones, with notable gains on fine-grained tasks and robust ablations and significance analyses supporting its effectiveness. The approach maintains computational efficiency while offering better discriminative power and generalization, and it opens pathways to future extensions such as angular margins and semi-supervised settings.
Abstract
Image classification is hindered by subtle inter-class differences and substantial intra-class variations, which limit the effectiveness of existing contrastive learning methods. Supervised contrastive approaches based on the InfoNCE loss suffer from negative-sample dilution and lack adaptive decision boundaries, thereby reducing discriminative power in fine-grained recognition tasks. To address these limitations, we propose Sigmoid-based Common and Style Supervised Contrastive Learning (SCS-SupCon). Our framework introduces a sigmoid-based pairwise contrastive loss with learnable temperature and bias parameters to enable adaptive decision boundaries. This formulation emphasizes hard negatives, mitigates negative-sample dilution, and more effectively exploits supervision. In addition, an explicit style-distance constraint further disentangles style and content representations, leading to more robust feature learning. Comprehensive experiments on six benchmark datasets, including CUB200-2011 and Stanford Dogs, demonstrate that SCS-SupCon achieves state-of-the-art performance across both CNN and Transformer backbones. On CIFAR-100 with ResNet-50, SCS-SupCon improves top-1 accuracy over SupCon by approximately 3.9 percentage points and over CS-SupCon by approximately 1.7 points under five-fold cross-validation. On fine-grained datasets, it outperforms CS-SupCon by 0.4--3.0 points. Extensive ablation studies and statistical analyses further confirm the robustness and generalization of the proposed framework, with Friedman tests and Nemenyi post-hoc evaluations validating the stability of the observed improvements.
