Table of Contents
Fetching ...

Singleton-Optimized Conformal Prediction

Tao Wang, Yan Sun, Edgar Dobriban

TL;DR

Conformal prediction guarantees coverage but often yields large, ambiguous prediction sets. This paper derives a singleton-optimized nonconformity score via a Lagrangian relaxation that balances the singleton objective with expected set length, yielding per-instance top-$j$ label sets and nested conformal predictions. The computation reduces to a lower convex hull in $\mathbb{R}^2$, enabling an $O(K)$ algorithm for both scoring and split conformal prediction. Empirically, SOCOP increases singleton frequency by up to ~20% with only modest increases in average set size across ImageNet, TissueMNIST, and MMLU, suggesting substantial practical gains in producing unambiguous predictions while preserving coverage.

Abstract

Conformal prediction can be used to construct prediction sets that cover the true outcome with a desired probability, but can sometimes lead to large prediction sets that are costly in practice. The most useful outcome is a singleton prediction-an unambiguous decision-yet existing efficiency-oriented methods primarily optimize average set size. Motivated by this, we propose a new nonconformity score that aims to minimize the probability of producing non-singleton sets. Starting from a non-convex constrained optimization problem as a motivation, we provide a geometric reformulation and associated algorithm for computing the nonconformity score and associated split conformal prediction sets in O(K) time for K-class problems. Using this score in split conformal prediction leads to our proposed Singleton-Optimized Conformal Prediction (SOCOP) method. We evaluate our method in experiments on image classification and LLM multiple-choice question-answering, comparing with standard nonconformity scores such as the (negative) label probability estimates and their cumulative distribution function; both of which are motivated by optimizing length. The results show that SOCOP increases singleton frequency (sometimes by over 20%) compared to the above scores, with minimal impact on average set size.

Singleton-Optimized Conformal Prediction

TL;DR

Conformal prediction guarantees coverage but often yields large, ambiguous prediction sets. This paper derives a singleton-optimized nonconformity score via a Lagrangian relaxation that balances the singleton objective with expected set length, yielding per-instance top- label sets and nested conformal predictions. The computation reduces to a lower convex hull in , enabling an algorithm for both scoring and split conformal prediction. Empirically, SOCOP increases singleton frequency by up to ~20% with only modest increases in average set size across ImageNet, TissueMNIST, and MMLU, suggesting substantial practical gains in producing unambiguous predictions while preserving coverage.

Abstract

Conformal prediction can be used to construct prediction sets that cover the true outcome with a desired probability, but can sometimes lead to large prediction sets that are costly in practice. The most useful outcome is a singleton prediction-an unambiguous decision-yet existing efficiency-oriented methods primarily optimize average set size. Motivated by this, we propose a new nonconformity score that aims to minimize the probability of producing non-singleton sets. Starting from a non-convex constrained optimization problem as a motivation, we provide a geometric reformulation and associated algorithm for computing the nonconformity score and associated split conformal prediction sets in O(K) time for K-class problems. Using this score in split conformal prediction leads to our proposed Singleton-Optimized Conformal Prediction (SOCOP) method. We evaluate our method in experiments on image classification and LLM multiple-choice question-answering, comparing with standard nonconformity scores such as the (negative) label probability estimates and their cumulative distribution function; both of which are motivated by optimizing length. The results show that SOCOP increases singleton frequency (sometimes by over 20%) compared to the above scores, with minimal impact on average set size.

Paper Structure

This paper contains 25 sections, 7 theorems, 25 equations, 5 figures, 17 tables, 3 algorithms.

Key Result

Lemma 2.1

For any $\eta \geqslant0$ and $\gamma \in \Delta_{K-1}$, $S_{\eta,\gamma}$ is the set of top-$j$ labels for some $j$ that depends on $\eta$ and $\gamma$.

Figures (5)

  • Figure 1: Lower convex hull for a simulated probability vector with $K=10$.
  • Figure 2: Visualizing the evaluation results for of ResNet152-v2 on ImageNet-Val from Table \ref{['tab:resnet_lambda_imgval']}. LAS denotes Least Ambiguous Sets. Left: Average size and $P(\textnormal{size}>1)$ varying with $\lambda$; Right: visualization of (Average size, $P(\textnormal{size}>1)$), each point corresponding to a specific $\lambda$. Results corresponding to the hyperparameter $\lambda$ selected by the kneedle algorithm satopaa2011finding are highlighted.
  • Figure 3: Set sizes produced with ResNet152-v2 on ImageNet-Val. LAS denotes Least Ambiguous Sets. Bars indicate empirical probabilities of set sizes, and shaded bins mark non-singleton set sizes where SOCOP assigns higher mass. Reported $\Delta$ values denote the cumulative probability difference on shaded bins. The x-axis is truncated at 20 for clarity.
  • Figure 4: Primal and dual views, for $\lambda=0.1$ and the example probability vector $\gamma$ is $[0.202, 0.172, 0.157, 0.143, 0.127, 0.077, 0.057, 0.031,0.027,0.007]$.
  • Figure 5: Optimal Set Size $\kappa(\eta;\gamma)$ with the same parameters as Figure \ref{['fig:dual_view_example']}. The tie-breaking rule does not affect the value of nonconformity score.

Theorems & Definitions (11)

  • Lemma 2.1: The structure of singleton optimal sets
  • Lemma 2.2: Nested Sets Property
  • Definition 2.3: Singleton-optimized nonconformity score
  • Corollary 2.4: Properties of the optimal index function
  • Theorem 2.5: Characterizing the optimal index function $\kappa$
  • Corollary 2.6: Recovery of singleton objective optimization and least ambiguous sets
  • Lemma B.1
  • proof
  • Lemma B.2: Unique Optimality on Vertex Intervals
  • proof
  • ...and 1 more