Table of Contents
Fetching ...

Provably Minimum-Length Conformal Prediction Sets for Ordinal Classification

Zijian Zhang, Xinyu Chen, Yuanjie Shi, Liyuan Lillian Ma, Zifan Xu, Yan Yan

TL;DR

This paper addresses uncertainty quantification in ordinal classification by introducing two model-agnostic conformal predictors, min-CPS and min-RCPS. The core idea is instance-level minimum-length covering, solved via a linear-time sliding-window algorithm that yields exact optimal intervals while preserving marginal coverage under exchangeability. A length-regularized variant min-RCPS further improves efficiency by penalizing interval length without sacrificing coverage. Empirical results across four diverse datasets show substantial reductions in average prediction-set length (around 14–15% on average) and significant speedups over strong baselines, corroborating both theoretical guarantees and practical impact for high-stakes ordinal tasks.

Abstract

Ordinal classification has been widely applied in many high-stakes applications, e.g., medical imaging and diagnosis, where reliable uncertainty quantification (UQ) is essential for decision making. Conformal prediction (CP) is a general UQ framework that provides statistically valid guarantees, which is especially useful in practice. However, prior ordinal CP methods mainly focus on heuristic algorithms or restrictively require the underlying model to predict a unimodal distribution over ordinal labels. Consequently, they provide limited insight into coverage-efficiency trade-offs, or a model-agnostic and distribution-free nature favored by CP methods. To this end, we fill this gap by propose an ordinal-CP method that is model-agnostic and provides instance-level optimal prediction intervals. Specifically, we formulate conformal ordinal classification as a minimum-length covering problem at the instance level. To solve this problem, we develop a sliding-window algorithm that is optimal on each calibration data, with only a linear time complexity in K, the number of label candidates. The local optimality per instance further also improves predictive efficiency in expectation. Moreover, we propose a length-regularized variant that shrinks prediction set size while preserving coverage. Experiments on four benchmark datasets from diverse domains are conducted to demonstrate the significantly improved predictive efficiency of the proposed methods over baselines (by 15% decrease on average over four datasets).

Provably Minimum-Length Conformal Prediction Sets for Ordinal Classification

TL;DR

This paper addresses uncertainty quantification in ordinal classification by introducing two model-agnostic conformal predictors, min-CPS and min-RCPS. The core idea is instance-level minimum-length covering, solved via a linear-time sliding-window algorithm that yields exact optimal intervals while preserving marginal coverage under exchangeability. A length-regularized variant min-RCPS further improves efficiency by penalizing interval length without sacrificing coverage. Empirical results across four diverse datasets show substantial reductions in average prediction-set length (around 14–15% on average) and significant speedups over strong baselines, corroborating both theoretical guarantees and practical impact for high-stakes ordinal tasks.

Abstract

Ordinal classification has been widely applied in many high-stakes applications, e.g., medical imaging and diagnosis, where reliable uncertainty quantification (UQ) is essential for decision making. Conformal prediction (CP) is a general UQ framework that provides statistically valid guarantees, which is especially useful in practice. However, prior ordinal CP methods mainly focus on heuristic algorithms or restrictively require the underlying model to predict a unimodal distribution over ordinal labels. Consequently, they provide limited insight into coverage-efficiency trade-offs, or a model-agnostic and distribution-free nature favored by CP methods. To this end, we fill this gap by propose an ordinal-CP method that is model-agnostic and provides instance-level optimal prediction intervals. Specifically, we formulate conformal ordinal classification as a minimum-length covering problem at the instance level. To solve this problem, we develop a sliding-window algorithm that is optimal on each calibration data, with only a linear time complexity in K, the number of label candidates. The local optimality per instance further also improves predictive efficiency in expectation. Moreover, we propose a length-regularized variant that shrinks prediction set size while preserving coverage. Experiments on four benchmark datasets from diverse domains are conducted to demonstrate the significantly improved predictive efficiency of the proposed methods over baselines (by 15% decrease on average over four datasets).

Paper Structure

This paper contains 50 sections, 9 theorems, 29 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

(Optimality and complexity of Algorithm alg:instance_level_min_length_covering) Let $K \in \mathbb N$ and $\tau \in (0, 1]$. For any input $X \in \mathcal{X}$, Algorithm alg:instance_level_min_length_covering: (i) returns $(l^*, u^*)$ that guarantees to exactly solve Problem (eq:instance_coverage_pr

Figures (3)

  • Figure 1: Impact of the regularization hyper-parameter $\lambda$ in min-RCPS on coverage and prediction set size on the Avocado Price dataset. When the coverage is guaranteed, the prediction set size initially decreases as $\lambda$ increases from $0$ to 0.003, where min-RCPS outperforms min-CPS by 2.13%$\downarrow$ in terms of prediction set size. We find that setting $\lambda$ to a relatively small value is sufficient to improve the predictive efficiency, and this happens on other datasets.
  • Figure 2: The empirical coverage rate $F(\tau)$: verify $F(\tau)$ monotonically increases in $\tau$ (ref. Section \ref{['subsection:min_CPS']}).
  • Figure 3: Impact of the regularization hyper-parameter $\lambda$ in min-RCPS on coverage and prediction set size on the UTKFace dataset. When the coverage is guaranteed, the prediction set size initially increases as $\lambda$ increases from $0$ to $1e-5$, then decreases as $\lambda$ increases from $1e-5$ to $3e-5$ where min-RCPS outperforms min-CPS by 0.05%$\downarrow$ in terms of prediction set size. We find that setting $\lambda$ to a relatively small value is sufficient to improve the predictive efficiency, and this happens on other datasets.

Theorems & Definitions (15)

  • Theorem 1
  • Definition 1
  • Lemma 1
  • Theorem 2
  • Corollary 1
  • Theorem A.1
  • proof
  • Lemma A.1
  • proof
  • Lemma 2
  • ...and 5 more