Table of Contents
Fetching ...

SCS-SupCon: Sigmoid-based Common and Style Supervised Contrastive Learning with Adaptive Decision Boundaries

Bin Wang, Fadi Dornaika

TL;DR

SCS-SupCon tackles the difficulty of fine-grained classification by replacing InfoNCE-style losses with a sigmoid-based pairwise contrastive loss that includes learnable decision boundaries, enabling adaptive differentiation between positives and negatives. It preserves CS-SupCon's explicit style-distance constraint to disentangle common (class-relevant) and style (intra-class variation) features, and uses a two-stage training scheme with a classifier trained on common features. Empirically, it achieves state-of-the-art results across six datasets and diverse backbones, with notable gains on fine-grained tasks and robust ablations and significance analyses supporting its effectiveness. The approach maintains computational efficiency while offering better discriminative power and generalization, and it opens pathways to future extensions such as angular margins and semi-supervised settings.

Abstract

Image classification is hindered by subtle inter-class differences and substantial intra-class variations, which limit the effectiveness of existing contrastive learning methods. Supervised contrastive approaches based on the InfoNCE loss suffer from negative-sample dilution and lack adaptive decision boundaries, thereby reducing discriminative power in fine-grained recognition tasks. To address these limitations, we propose Sigmoid-based Common and Style Supervised Contrastive Learning (SCS-SupCon). Our framework introduces a sigmoid-based pairwise contrastive loss with learnable temperature and bias parameters to enable adaptive decision boundaries. This formulation emphasizes hard negatives, mitigates negative-sample dilution, and more effectively exploits supervision. In addition, an explicit style-distance constraint further disentangles style and content representations, leading to more robust feature learning. Comprehensive experiments on six benchmark datasets, including CUB200-2011 and Stanford Dogs, demonstrate that SCS-SupCon achieves state-of-the-art performance across both CNN and Transformer backbones. On CIFAR-100 with ResNet-50, SCS-SupCon improves top-1 accuracy over SupCon by approximately 3.9 percentage points and over CS-SupCon by approximately 1.7 points under five-fold cross-validation. On fine-grained datasets, it outperforms CS-SupCon by 0.4--3.0 points. Extensive ablation studies and statistical analyses further confirm the robustness and generalization of the proposed framework, with Friedman tests and Nemenyi post-hoc evaluations validating the stability of the observed improvements.

SCS-SupCon: Sigmoid-based Common and Style Supervised Contrastive Learning with Adaptive Decision Boundaries

TL;DR

SCS-SupCon tackles the difficulty of fine-grained classification by replacing InfoNCE-style losses with a sigmoid-based pairwise contrastive loss that includes learnable decision boundaries, enabling adaptive differentiation between positives and negatives. It preserves CS-SupCon's explicit style-distance constraint to disentangle common (class-relevant) and style (intra-class variation) features, and uses a two-stage training scheme with a classifier trained on common features. Empirically, it achieves state-of-the-art results across six datasets and diverse backbones, with notable gains on fine-grained tasks and robust ablations and significance analyses supporting its effectiveness. The approach maintains computational efficiency while offering better discriminative power and generalization, and it opens pathways to future extensions such as angular margins and semi-supervised settings.

Abstract

Image classification is hindered by subtle inter-class differences and substantial intra-class variations, which limit the effectiveness of existing contrastive learning methods. Supervised contrastive approaches based on the InfoNCE loss suffer from negative-sample dilution and lack adaptive decision boundaries, thereby reducing discriminative power in fine-grained recognition tasks. To address these limitations, we propose Sigmoid-based Common and Style Supervised Contrastive Learning (SCS-SupCon). Our framework introduces a sigmoid-based pairwise contrastive loss with learnable temperature and bias parameters to enable adaptive decision boundaries. This formulation emphasizes hard negatives, mitigates negative-sample dilution, and more effectively exploits supervision. In addition, an explicit style-distance constraint further disentangles style and content representations, leading to more robust feature learning. Comprehensive experiments on six benchmark datasets, including CUB200-2011 and Stanford Dogs, demonstrate that SCS-SupCon achieves state-of-the-art performance across both CNN and Transformer backbones. On CIFAR-100 with ResNet-50, SCS-SupCon improves top-1 accuracy over SupCon by approximately 3.9 percentage points and over CS-SupCon by approximately 1.7 points under five-fold cross-validation. On fine-grained datasets, it outperforms CS-SupCon by 0.4--3.0 points. Extensive ablation studies and statistical analyses further confirm the robustness and generalization of the proposed framework, with Friedman tests and Nemenyi post-hoc evaluations validating the stability of the observed improvements.

Paper Structure

This paper contains 34 sections, 8 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Motivation and core innovations of our proposed SCS-SupCon method. (a) Existing CS-SupCon employs an InfoNCE-based contrastive loss, causing dilution of negative-sample information by simultaneously comparing numerous negatives. (b) Our proposed SCS-SupCon explicitly introduces a sigmoid-based pairwise contrastive loss with learnable parameters (temperature $t$ and bias $b$), adaptively adjusting decision boundaries to effectively distinguish subtle differences between negative and positive pairs.
  • Figure 2: Illustration of CS-SupCon. Stage 1: Deep metric learning explicitly decomposes embeddings into distinct common and style subspaces. Stage 2: A linear classifier is trained solely on common features, thus disregarding style-related variations.
  • Figure 3: Overview of SCS-SupCon: Sigmoid-based Common and Style Supervised Contrastive Learning with Adaptive Decision Boundaries. Stage 1: Deep metric learning explicitly decomposes embeddings into common and style subspaces, utilizing a sigmoid-based contrastive loss with adaptive decision boundaries. Stage 2: A linear classifier is trained solely on the sigmoid-disentangled common features.
  • Figure 4: Illustration of the mapping from the scalar product $r_{uv} = \mathbf{c}_u \cdot \mathbf{c}_v$ to the logits of positive and negative pairs, the sigmoid output, and the resulting logistic loss. The sigmoid loss encourages learning deep features with the following property: the scalar product of a positive pair should lie beyond the decision boundary (indicated by the red dot), while the scalar product of a negative pair should lie on the opposite side of that boundary. The adaptive boundary itself is learnable.
  • Figure 5: Illustration of deep feature extraction and composition using Transformers within the proposed SCS-SupCon framework. The CLS token produced by the Transformer encoder is first projected through a two-layer MLP with a ReLU activation, and then explicitly partitioned into common (192-d) and style (64-d) subspaces through a fixed dimensional partition without employing attention mechanisms. The sigmoid-based contrastive loss explicitly captures pairwise relationships on the common features, significantly enhancing feature disentanglement, especially beneficial for fine-grained classification tasks.
  • ...and 4 more figures