Non-Exchangeable Conformal Risk Control

António Farinhas; Chrysoula Zerva; Dennis Ulmer; André F. T. Martins

Non-Exchangeable Conformal Risk Control

António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins

TL;DR

This work extends conformal prediction by introducing non-exchangeable conformal risk control (non-X CRC), which provides formal guarantees on the expected value of monotone losses even when data are non-exchangeable due to drift or change points. By combining weighted calibration with a TV-distance bound and a max-entropy weight design, the approach yields tighter risk control than standard methods and recovers traditional CRC under exchangeability. The authors validate the framework on synthetic time-series multilabel classification, electricity usage monitoring, and open-domain QA, showing improved risk adherence and smaller prediction sets when exploiting non-exchangeability. The method offers practical guarantees for deployment in non-i.i.d. settings and broad applicability to tasks where distribution drift is expected, including potential use in large language models and RL contexts.

Abstract

Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best $F_1$-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its relevance for a given test example; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.

Non-Exchangeable Conformal Risk Control

TL;DR

Abstract

-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its relevance for a given test example; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.

Paper Structure (17 sections, 2 theorems, 24 equations, 4 figures, 2 tables)

This paper contains 17 sections, 2 theorems, 24 equations, 4 figures, 2 tables.

Introduction
Background
Conformal prediction
Non-exchangeable conformal prediction
Conformal risk control
Non-exchangeable conformal risk control
Formal guarantees
How to choose weights
Experiments
Multilabel classification in a time series
Monitoring electricity usage
Open-domain question answering
Related work
Conclusions
Proof of \ref{['lemma:tv_bound']}
...and 2 more sections

Key Result

lemma 1

Let $f: S \rightarrow [A,B] \subset \mathbb{R}$ be a bounded function on a measurable space $(S, \mathcal{A})$ (where $\mathcal{A} \subseteq 2^S$ is a $\sigma$-algebra) and let $P$ and $Q$ be two probability measures on $(S, \mathcal{A})$. Then

Figures (4)

Figure 1: Average loss (top) and $\hat{\lambda}$ (bottom) over $10$ independent trials for settings (1), (2), and (3). We smooth all the curves by taking a rolling average with a window of $30$ time points.
Figure 2: Results on ELEC2 data for $\alpha=0.05$ and $\lambda$ defined by the prediction interval width. Presented curves are smoothed by taking a rolling average with a window of 300 data points per timestep.
Figure 3: $F_1$-score control on the Natural Questions dataset. Average set size (left) and risk (right) over 1000 independent random data splits.
Figure 4: Average loss (top) and $\hat{\lambda}$ (bottom) over $10$ independent trials for settings (1), (2), and (3). In this case, $\lambda$ represents the number of predicted labels. We smooth the curves by taking a rolling average with a window of $30$ time points.

Theorems & Definitions (3)

definition 1: Exchangeable data distribution
lemma 1
theorem 1: Non-exchangeable conformal risk control

Non-Exchangeable Conformal Risk Control

TL;DR

Abstract

Non-Exchangeable Conformal Risk Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (3)