Singleton-Optimized Conformal Prediction
Tao Wang, Yan Sun, Edgar Dobriban
TL;DR
Conformal prediction guarantees coverage but often yields large, ambiguous prediction sets. This paper derives a singleton-optimized nonconformity score via a Lagrangian relaxation that balances the singleton objective with expected set length, yielding per-instance top-$j$ label sets and nested conformal predictions. The computation reduces to a lower convex hull in $\mathbb{R}^2$, enabling an $O(K)$ algorithm for both scoring and split conformal prediction. Empirically, SOCOP increases singleton frequency by up to ~20% with only modest increases in average set size across ImageNet, TissueMNIST, and MMLU, suggesting substantial practical gains in producing unambiguous predictions while preserving coverage.
Abstract
Conformal prediction can be used to construct prediction sets that cover the true outcome with a desired probability, but can sometimes lead to large prediction sets that are costly in practice. The most useful outcome is a singleton prediction-an unambiguous decision-yet existing efficiency-oriented methods primarily optimize average set size. Motivated by this, we propose a new nonconformity score that aims to minimize the probability of producing non-singleton sets. Starting from a non-convex constrained optimization problem as a motivation, we provide a geometric reformulation and associated algorithm for computing the nonconformity score and associated split conformal prediction sets in O(K) time for K-class problems. Using this score in split conformal prediction leads to our proposed Singleton-Optimized Conformal Prediction (SOCOP) method. We evaluate our method in experiments on image classification and LLM multiple-choice question-answering, comparing with standard nonconformity scores such as the (negative) label probability estimates and their cumulative distribution function; both of which are motivated by optimizing length. The results show that SOCOP increases singleton frequency (sometimes by over 20%) compared to the above scores, with minimal impact on average set size.
