Conditional validity of heteroskedastic conformal regression

Nicolas Dewolf; Bernard De Baets; Willem Waegeman

Conditional validity of heteroskedastic conformal regression

Nicolas Dewolf, Bernard De Baets, Willem Waegeman

TL;DR

The paper addresses conditional validity in conformal regression under heteroskedastic noise by analyzing and comparing inductive (split) conformal prediction, normalized conformal prediction (NCP), and Mondrian conformal predictors (MCP). It develops theoretical links between conditional validity and pivotal quantities, showing that normalization can yield conditional guarantees under location-scale families, and that MCP delivers class-wise conditional validity when data are partitioned by a taxonomy based on uncertainty. Through synthetic and real-data experiments, the authors demonstrate that MCP provides more stable conditional coverage near the target level, while marginal methods may under- or over-cover in regions of higher uncertainty. The work advances practical uncertainty quantification by enabling adaptive, conditionally valid prediction sets in heteroskedastic regression and offers diagnostic tools to assess conditional performance.

Abstract

Conformal prediction, and split conformal prediction as a specific implementation, offer a distribution-free approach to estimating prediction intervals with statistical guarantees. Recent work has shown that split conformal prediction can produce state-of-the-art prediction intervals when focusing on marginal coverage, i.e. on a calibration dataset the method produces on average prediction intervals that contain the ground truth with a predefined coverage level. However, such intervals are often not adaptive, which can be problematic for regression problems with heteroskedastic noise. This paper tries to shed new light on how prediction intervals can be constructed, using methods such as normalized and Mondrian conformal prediction, in such a way that they adapt to the heteroskedasticity of the underlying process. Theoretical and experimental results are presented in which these methods are compared in a systematic way. In particular, it is shown how the conditional validity of a chosen conformal predictor can be related to (implicit) assumptions about the data-generating distribution.

Conditional validity of heteroskedastic conformal regression

TL;DR

Abstract

Paper Structure (19 sections, 7 theorems, 63 equations, 16 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 7 theorems, 63 equations, 16 figures, 3 tables, 1 algorithm.

Introduction
Problem statement
Conformal prediction
Inductive conformal regression
Normalized conformal prediction (NCP)
Mondrian conformal predictor (MCP)
Theoretical results
Pivotal quantities
Normalization
Experiments on synthetic data
Data types
Deviations from oracle
Diagnostics
Experiments on real data
Models
...and 4 more sections

Key Result

Theorem 1

Let $\Gamma^\alpha:\mathcal{X}\times(\mathcal{X}\times\mathbb{R})^\infty\rightarrow[\mathbb{R}]$ be an inductive conformal predictor at significance level $\alpha\in[0,1]$. If the nonconformity scores are exchangeable for any calibration set $\mathcal{V}$ and any new observation $(\mathbf{x},y)$, i. where the probability is taken over both $(X,Y)$ and $V$. Moreover, if the nonconformity scores are

Figures (16)

Figure 1: Two data samples with the same trend but with different noise levels: $y(x,s)\sim0.1x+2s+\varepsilon(s)$, where $s\in\{0,1\}$ is a dummy variable labelling the subgroups. The blue subgroup ($s=0$) has standard deviation $0.1$, while the red subgroup ($s=1$) has standard deviation $0.5$. Although the prediction intervals are valid at the $\alpha=0.2$ significance level, both marginally and for the blue subgroup, this is not the case for the red subgroup.
Figure 2: Division of the feature space based on two different taxonomy functions. The first one simply thresholds the second dimension (at $\xi=0.2$), whereas the second one performs equal-frequency binning on the (conditional) standard deviation, which has the form shown in Eq. \ref{['problem_example_distribution']}.
Figure 3: Probability density function of the triangular distribution \ref{['triangle']} with (conditional) width parameter $\lambda(\mathbf{x})=5$.
Figure 4: Conditional coverage at significance level $\alpha=0.1$ for synthetic data sets of Types 1, 2 and 3. For every type, the data is divided in three folds based on equal-frequency binning of the estimated variance. The coloured columns indicate the type of misspecification (from left to right): oracle, additive noise on the standard deviation (means of 0.01, 0.1 and 1), scaling by factor 5 of the standard deviation and additive noise on the mean (means of 1 and $\widehat{\sigma}$). For every model, three nonconformity measures are shown (from left to right): residual, interval and $\widehat{\sigma}$-normalized nonconformity measure.
Figure 5: CDF plot of the (true) variance. The colors indicate the taxonomy classes with equal-frequency binning ($n=3$ classes).
...and 11 more figures

Theorems & Definitions (21)

Definition 1: Validity
Example 1
Theorem 1: Marginal validity
Lemma 1
Example 2: Mean-variance estimators
Remark 1
Theorem 2: Conditional validity
Theorem 3: Independence
proof
Remark 2
...and 11 more

Conditional validity of heteroskedastic conformal regression

TL;DR

Abstract

Conditional validity of heteroskedastic conformal regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (21)