Table of Contents
Fetching ...

Non-monotonicity in Conformal Risk Control

Tareq Aldirawi, Yun Li, Wenge Guo

Abstract

Conformal risk control (CRC) provides distribution-free guarantees for controlling the expected loss at a user-specified level. Existing theory typically assumes that the loss decreases monotonically with a tuning parameter that governs the size of the prediction set. This assumption is often violated in practice, where losses may behave non-monotonically due to competing objectives such as coverage and efficiency. We study CRC under non-monotone loss functions when the tuning parameter is selected from a finite grid, a common scenario in thresholding or discretized decision rules. Revisiting a known counterexample, we show that the validity of CRC without monotonicity depends on the relationship between the calibration sample size and the grid resolution. In particular, risk control can still be achieved when the calibration sample is sufficiently large relative to the grid. We provide a finite-sample guarantee for bounded losses over a grid of size $m$, showing that the excess risk above the target level $α$ is of order $\sqrt{\log(m)/n}$, where $n$ is the calibration sample size. A matching lower bound shows that this rate is minimax optimal. We also derive refined guarantees under additional structural conditions, including Lipschitz continuity and monotonicity, and extend the analysis to settings with distribution shift via importance weighting. Numerical experiments on synthetic multilabel classification and real object detection data illustrate the practical impact of non-monotonicity. Methods that account for finite-sample deviations achieve more stable risk control than approaches based on monotonicity transformations, while maintaining competitive prediction-set sizes.

Non-monotonicity in Conformal Risk Control

Abstract

Conformal risk control (CRC) provides distribution-free guarantees for controlling the expected loss at a user-specified level. Existing theory typically assumes that the loss decreases monotonically with a tuning parameter that governs the size of the prediction set. This assumption is often violated in practice, where losses may behave non-monotonically due to competing objectives such as coverage and efficiency. We study CRC under non-monotone loss functions when the tuning parameter is selected from a finite grid, a common scenario in thresholding or discretized decision rules. Revisiting a known counterexample, we show that the validity of CRC without monotonicity depends on the relationship between the calibration sample size and the grid resolution. In particular, risk control can still be achieved when the calibration sample is sufficiently large relative to the grid. We provide a finite-sample guarantee for bounded losses over a grid of size , showing that the excess risk above the target level is of order , where is the calibration sample size. A matching lower bound shows that this rate is minimax optimal. We also derive refined guarantees under additional structural conditions, including Lipschitz continuity and monotonicity, and extend the analysis to settings with distribution shift via importance weighting. Numerical experiments on synthetic multilabel classification and real object detection data illustrate the practical impact of non-monotonicity. Methods that account for finite-sample deviations achieve more stable risk control than approaches based on monotonicity transformations, while maintaining competitive prediction-set sizes.

Paper Structure

This paper contains 40 sections, 10 theorems, 158 equations, 6 figures, 3 tables.

Key Result

Theorem 1

Suppose that for each $\lambda \in \Lambda$, $\{L_i(\lambda)\}_{i=1}^{n+1}$ are i.i.d. and satisfy Assume there exists $\lambda^\star \in \Lambda$ such that Define with the convention that $\inf \varnothing = \lambda_m$. Then where $\blacktriangleleft$$\blacktriangleleft$

Figures (6)

  • Figure 1: Empirical risk curves and selected thresholds under a non-monotonic loss ($n = 10{,}000$, $m = 100$, $\alpha = 0.10$). The empirical risk $\hat{R}_n^{+}(\lambda)$ (blue) exhibits a non-monotonic bump near $\lambda \approx 0.35$. Loss monotonization $\tilde{R}_n^{+}(\lambda)$ (orange) and risk monotonization $\hat{R}_n^{\uparrow}(\lambda)$ (green) enforce monotonicity via right-envelope corrections and select at level $\alpha$ (dashed grey line). CRC-NM selects at the adjusted level $\alpha' = \alpha - D(m,n)$ (dotted line). Open circles mark the selected thresholds. In this example, despite operating at a stricter level, CRC-NM selects the smallest threshold.
  • Figure 2: Risk distributions on ImageNet using ResNet-18 predictions with $n = 40{,}000$ calibration samples, $m = 200$ candidate thresholds, and target level $\alpha = 0.15$, averaged over $5{,}000$ random calibration–test splits. The loss combines miscoverage with a small oversize penalty, inducing non-monotonic behavior with respect to the prediction-set threshold. CRC, CRC-C, and CRC-NM achieve similar empirical risk levels, while CRC-NM applies a larger explicit correction.
  • Figure 3: Synthetic multilabel experiment based on $1{,}000$ repetitions. Left: distribution of test risks; the dashed line indicates the target level $\alpha = 0.15$. Loss and risk monotonization select extreme thresholds when the monotonized risk exceeds $\alpha$ across the entire grid, resulting in test risks far above the target. Right: distribution of prediction set sizes across methods.
  • Figure 4: COCO object detection experiment. Left: distribution of test risks across repeated calibration--test splits; the dashed line indicates the target risk level $\alpha=0.33$. Right: distribution of prediction-set sizes for each method.
  • Figure 5: Excess risk bounds as a function of sample size for $\sigma_{\max}/B=0.3$ and $m=200$. The Bernstein bound (dashed) improves substantially over Hoeffding (solid) throughout the range $n\in[500,\,20{,}000]$. The empirical Bernstein bound (dash-dotted, $\delta=0.05$) tracks the oracle Bernstein bound at large $n$, with a larger gap at small $n$ due to the variance estimation penalty and the additional $\log(1/\delta)$ factor.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Theorem 1
  • Remark 1: Feasibility
  • Proposition 1: Lower bound
  • Proposition 2: Lipschitz Refinement
  • Proposition 3: Exact Control under Monotonicity
  • proof
  • Proposition 4: Distributional shift
  • Lemma 1: Uniform Concentration Bound
  • proof
  • proof
  • ...and 10 more