Conformal Risk Control

Anastasios N. Angelopoulos; Stephen Bates; Adam Fisch; Lihua Lei; Tal Schuster

Conformal Risk Control

Anastasios N. Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, Tal Schuster

TL;DR

The paper extends conformal prediction to guarantee the expected risk for any bounded monotone loss, yielding a flexible, distribution-free framework (conformal risk control) that is tight up to an O(1/n) factor. It introduces a practical algorithm to select a conservativeness parameter by calibrating on a holdout set, and establishes both upper risk guarantees and a matching lower bound, while showing the special case reduces to standard conformal prediction under miscoverage loss. The authors demonstrate the approach with four real-world tasks—FNR control in tumor segmentation and multilabel classification, graph-distance control in hierarchical classification, and F1-score control in open-domain QA—covering computer vision and NLP. They also develop extensive extensions to handle distributional shift, quantile and multiple risks, adversarial perturbations, and U-risk control, broadening applicability to robust and safe ML deployment across diverse settings.

Abstract

We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.

Conformal Risk Control

TL;DR

Abstract

factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.

Paper Structure (22 sections, 12 theorems, 92 equations, 4 figures)

This paper contains 22 sections, 12 theorems, 92 equations, 4 figures.

Introduction
Algorithm and preview of main results
Related work
Theory
Risk control
A risk lower bound
Conformal prediction reduces to risk control
Controlling general loss functions
Examples
FNR control in tumor segmentation
FNR control in multilabel classification
Control of graph distance in hierarchical image classification
F1-score control in open-domain question answering
Extensions
Risk control under distributional shift
...and 7 more sections

Key Result

Theorem 2.1

Assume that $L_{i}(\lambda)$ is non-increasing in $\lambda$, right-continuous, and Then

Figures (4)

Figure 1: FNR control in tumor segmentation. The top figure shows examples of our procedure with correct pixels in white, false positives in blue, and false negatives in red. The bottom plots report FNR and set size over 1000 independent random data splits. The dashed gray line marks $\alpha$.
Figure 2: FNR control on MS COCO. The top figure shows examples of our procedure with correct classes in black, false positives in blue, and false negatives in red. The bottom plots report FNR and set size over 1000 independent random data splits. The dashed gray line marks $\alpha$.
Figure 3: Control of graph distance on hierarchical ImageNet. The top figure shows examples of our procedure with correct classes in black, false positives in blue, and false negatives in red. The bottom plots report our minimum hierarchical distance loss and set size over 1000 independent random data splits. The dashed gray line marks $\alpha$.
Figure 4: F1-score control on Natural Questions. The top figure shows examples of our procedure with fully correct answers in green, partially correct answers in blue, and false positives in gray. Note that due to the nature of the evaluation, answers that are technically correct may still be down-graded if they do not match the reference. We treat this as part of the randomness in the task. The bottom plots report the F1 risk and average set size over 1000 independent random data splits. The dashed gray line marks $\alpha$.

Theorems & Definitions (23)

Theorem 2.1
proof
Theorem 2.2
Proposition 2.1
Corollary 1
Proposition 4.1
Proposition 4.2
Proposition 4.3
Proposition 4.4
Proposition 4.5
...and 13 more

Conformal Risk Control

TL;DR

Abstract

Conformal Risk Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (23)