Non-Exchangeable Conformal Risk Control
António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins
TL;DR
This work extends conformal prediction by introducing non-exchangeable conformal risk control (non-X CRC), which provides formal guarantees on the expected value of monotone losses even when data are non-exchangeable due to drift or change points. By combining weighted calibration with a TV-distance bound and a max-entropy weight design, the approach yields tighter risk control than standard methods and recovers traditional CRC under exchangeability. The authors validate the framework on synthetic time-series multilabel classification, electricity usage monitoring, and open-domain QA, showing improved risk adherence and smaller prediction sets when exploiting non-exchangeability. The method offers practical guarantees for deployment in non-i.i.d. settings and broad applicability to tasks where distribution drift is expected, including potential use in large language models and RL contexts.
Abstract
Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best $F_1$-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its relevance for a given test example; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.
