Table of Contents
Fetching ...

Confidence-aware Contrastive Learning for Selective Classification

Yu-Chang Wu, Shen-Huan Lyu, Haopu Shang, Xiangyu Wang, Chao Qian

TL;DR

This paper tackles selective classification by deriving a generalization bound that links predictive confidence and feature representation. It introduces CCL-SC, a confidence-aware contrastive learning framework that optimizes the feature space to pull together correctly classified samples and push apart misclassified ones, with reweighting by the model's predictive confidence. The method employs a MoCo-style dual-queue setup to construct positive and negative samples and defines a CSC loss that integrates SR into the contrastive objective. Empirically, CCL-SC achieves lower selective risk than state-of-the-art methods across CIFAR-10/100, CelebA, and ImageNet on most coverage levels and can be effectively combined with existing selective-classification techniques to achieve further gains. Overall, the study demonstrates the practical value of feature-level optimization for selective classification and offers a principled avenue to improve reliability in high-stakes applications.

Abstract

Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generalization bound for selective classification, disclosing that optimizing feature layers helps improve the performance of selective classification. Inspired by this theory, we propose to explicitly improve the selective classification model at the feature level for the first time, leading to a novel Confidence-aware Contrastive Learning method for Selective Classification, CCL-SC, which similarizes the features of homogeneous instances and differentiates the features of heterogeneous instances, with the strength controlled by the model's confidence. The experimental results on typical datasets, i.e., CIFAR-10, CIFAR-100, CelebA, and ImageNet, show that CCL-SC achieves significantly lower selective risk than state-of-the-art methods, across almost all coverage degrees. Moreover, it can be combined with existing methods to bring further improvement.

Confidence-aware Contrastive Learning for Selective Classification

TL;DR

This paper tackles selective classification by deriving a generalization bound that links predictive confidence and feature representation. It introduces CCL-SC, a confidence-aware contrastive learning framework that optimizes the feature space to pull together correctly classified samples and push apart misclassified ones, with reweighting by the model's predictive confidence. The method employs a MoCo-style dual-queue setup to construct positive and negative samples and defines a CSC loss that integrates SR into the contrastive objective. Empirically, CCL-SC achieves lower selective risk than state-of-the-art methods across CIFAR-10/100, CelebA, and ImageNet on most coverage levels and can be effectively combined with existing selective-classification techniques to achieve further gains. Overall, the study demonstrates the practical value of feature-level optimization for selective classification and offers a principled avenue to improve reliability in high-stakes applications.

Abstract

Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generalization bound for selective classification, disclosing that optimizing feature layers helps improve the performance of selective classification. Inspired by this theory, we propose to explicitly improve the selective classification model at the feature level for the first time, leading to a novel Confidence-aware Contrastive Learning method for Selective Classification, CCL-SC, which similarizes the features of homogeneous instances and differentiates the features of heterogeneous instances, with the strength controlled by the model's confidence. The experimental results on typical datasets, i.e., CIFAR-10, CIFAR-100, CelebA, and ImageNet, show that CCL-SC achieves significantly lower selective risk than state-of-the-art methods, across almost all coverage degrees. Moreover, it can be combined with existing methods to bring further improvement.
Paper Structure (30 sections, 2 theorems, 26 equations, 3 figures, 18 tables, 1 algorithm)

This paper contains 30 sections, 2 theorems, 26 equations, 3 figures, 18 tables, 1 algorithm.

Key Result

Theorem 4.1

$\forall \rho, \rho^{\prime}, \alpha,\beta,\lambda>0$, and $\forall \delta>0$, with probability at least $1-\delta$ over a training set of size $m$, we have: where $\|l\|_2$ denotes the L2-norm of the classification layer $l$'s parameters, and $\Tilde{\rho}=\min \{\rho/(4 \alpha), \rho^{\prime}/(4 \beta \lambda+2 \alpha)\}$.

Figures (3)

  • Figure 1: Illustration of the proposed CCL-SC method. The right part outlines our definition of positive/negative samples: a sample is positive/negative if the prediction matches the anchor's label and is correct/incorrect. Two independent queues store positive and negative samples, respectively. The middle part displays the characteristic of the proposed CSC loss: prompting the model to separate correctly classified and misclassified samples at the feature level and focus on samples with high prediction confidence. The black arrow on the left represents forward calculation, while the yellow and red ones represent backpropagation of the cross-entropy and CSC loss, respectively.
  • Figure 2: The intra-class variance (a) and the bound in Theorem \ref{['thm:bound']} (b) changes of different methods during the training process on CIFAR-100. In (b), we also include the generalization error of CCL-SC.
  • Figure 3: The t-SNE Visualization t-SNE of SR and CCL-SC feature representations on the CIFAR-10 dataset at 95% coverage. Point colors indicate class categories. Light-colored points represent samples selected for abstaining from predicting.

Theorems & Definitions (4)

  • Theorem 4.1
  • proof
  • Lemma 2.1
  • proof : Proof of Lemma \ref{['lem:pac']}