Table of Contents
Fetching ...

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Alvin Heng, Harold Soh

TL;DR

This work tackles selective classification under covariate shift by recasting abstention as a Neyman–Pearson hypothesis test, showing that the optimal selector is a monotone transformation of the likelihood ratio $p_c({\mathbf{x}})/p_w({\mathbf{x}})$ and unifying prior post-hoc scores under this framework. It introduces two distance-based NP-optimal selectors, $\Delta$-MDS and $\Delta$-KNN, along with a simple linear combination strategy that blends distance- and logit-based scores (e.g., $\Delta$-MDS-RLog, $\Delta$-KNN-RLog). The paper provides theoretical justification for these scores and demonstrates strong empirical gains on vision and language benchmarks under covariate shift, including with vision-language models like CLIP and EVA. The results highlight the practical impact of likelihood-ratio-based selective classification for robust deployment in real-world, distribution-shifted settings, with public code available for replication.

Abstract

Selective classification enhances the reliability of predictive models by allowing them to abstain from making uncertain predictions. In this work, we revisit the design of optimal selection functions through the lens of the Neyman--Pearson lemma, a classical result in statistics that characterizes the optimal rejection rule as a likelihood ratio test. We show that this perspective not only unifies the behavior of several post-hoc selection baselines, but also motivates new approaches to selective classification which we propose here. A central focus of our work is the setting of covariate shift, where the input distribution at test time differs from that at training. This realistic and challenging scenario remains relatively underexplored in the context of selective classification. We evaluate our proposed methods across a range of vision and language tasks, including both supervised learning and vision-language models. Our experiments demonstrate that our Neyman--Pearson-informed methods consistently outperform existing baselines, indicating that likelihood ratio-based selection offers a robust mechanism for improving selective classification under covariate shifts. Our code is publicly available at https://github.com/clear-nus/sc-likelihood-ratios.

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

TL;DR

This work tackles selective classification under covariate shift by recasting abstention as a Neyman–Pearson hypothesis test, showing that the optimal selector is a monotone transformation of the likelihood ratio and unifying prior post-hoc scores under this framework. It introduces two distance-based NP-optimal selectors, -MDS and -KNN, along with a simple linear combination strategy that blends distance- and logit-based scores (e.g., -MDS-RLog, -KNN-RLog). The paper provides theoretical justification for these scores and demonstrates strong empirical gains on vision and language benchmarks under covariate shift, including with vision-language models like CLIP and EVA. The results highlight the practical impact of likelihood-ratio-based selective classification for robust deployment in real-world, distribution-shifted settings, with public code available for replication.

Abstract

Selective classification enhances the reliability of predictive models by allowing them to abstain from making uncertain predictions. In this work, we revisit the design of optimal selection functions through the lens of the Neyman--Pearson lemma, a classical result in statistics that characterizes the optimal rejection rule as a likelihood ratio test. We show that this perspective not only unifies the behavior of several post-hoc selection baselines, but also motivates new approaches to selective classification which we propose here. A central focus of our work is the setting of covariate shift, where the input distribution at test time differs from that at training. This realistic and challenging scenario remains relatively underexplored in the context of selective classification. We evaluate our proposed methods across a range of vision and language tasks, including both supervised learning and vision-language models. Our experiments demonstrate that our Neyman--Pearson-informed methods consistently outperform existing baselines, indicating that likelihood ratio-based selection offers a robust mechanism for improving selective classification under covariate shifts. Our code is publicly available at https://github.com/clear-nus/sc-likelihood-ratios.

Paper Structure

This paper contains 30 sections, 10 theorems, 26 equations, 2 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

Let $Z \in \mathbb{R}^d$ be a random variable, and consider the hypotheses: where $P_0$ and $P_1$ have densities $p_0$ and $p_1$ that are strictly positive on a shared support $\mathcal{Z} \subset \mathbb{R}^d$. For any measurable acceptance region $A \subset \mathcal{Z}$ under $\mathcal{H}_0$, define the type I error (false rejection) as $\alpha(A) = P_0(Z \notin A)$, and satisfies $\alpha(A^*)

Figures (2)

  • Figure 1: Illustration of our proposed Neyman--Pearson optimal distance-based selective classification methods. We estimate the likelihoods of correct and incorrect predictions ($p_c$ and $p_w$) as a function of distances to training sets consisting of correctly and incorrectly classified samples: $s({\bm{x}})=f(D_c({\bm{x}}), D_w({\bm{x}}))$, where $f$ here denotes a function. For example, ${\bm{x}}_1$ is “closer” to $p_c$ and “farther” from $p_w$ than ${\bm{x}}_2$, and should therefore receive a higher score.
  • Figure 2: Risk-coverage curves of various selector methods for CLIP (top row) and EVA (bottom row).

Theorems & Definitions (14)

  • Lemma 1: name=Neyman--Pearson neyman1933ixlehmann1986testing
  • Corollary 1: name=Informal
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 2
  • Theorem 3
  • proof
  • Theorem 3
  • proof
  • ...and 4 more