Table of Contents
Fetching ...

Conditional validity of heteroskedastic conformal regression

Nicolas Dewolf, Bernard De Baets, Willem Waegeman

TL;DR

The paper addresses conditional validity in conformal regression under heteroskedastic noise by analyzing and comparing inductive (split) conformal prediction, normalized conformal prediction (NCP), and Mondrian conformal predictors (MCP). It develops theoretical links between conditional validity and pivotal quantities, showing that normalization can yield conditional guarantees under location-scale families, and that MCP delivers class-wise conditional validity when data are partitioned by a taxonomy based on uncertainty. Through synthetic and real-data experiments, the authors demonstrate that MCP provides more stable conditional coverage near the target level, while marginal methods may under- or over-cover in regions of higher uncertainty. The work advances practical uncertainty quantification by enabling adaptive, conditionally valid prediction sets in heteroskedastic regression and offers diagnostic tools to assess conditional performance.

Abstract

Conformal prediction, and split conformal prediction as a specific implementation, offer a distribution-free approach to estimating prediction intervals with statistical guarantees. Recent work has shown that split conformal prediction can produce state-of-the-art prediction intervals when focusing on marginal coverage, i.e. on a calibration dataset the method produces on average prediction intervals that contain the ground truth with a predefined coverage level. However, such intervals are often not adaptive, which can be problematic for regression problems with heteroskedastic noise. This paper tries to shed new light on how prediction intervals can be constructed, using methods such as normalized and Mondrian conformal prediction, in such a way that they adapt to the heteroskedasticity of the underlying process. Theoretical and experimental results are presented in which these methods are compared in a systematic way. In particular, it is shown how the conditional validity of a chosen conformal predictor can be related to (implicit) assumptions about the data-generating distribution.

Conditional validity of heteroskedastic conformal regression

TL;DR

The paper addresses conditional validity in conformal regression under heteroskedastic noise by analyzing and comparing inductive (split) conformal prediction, normalized conformal prediction (NCP), and Mondrian conformal predictors (MCP). It develops theoretical links between conditional validity and pivotal quantities, showing that normalization can yield conditional guarantees under location-scale families, and that MCP delivers class-wise conditional validity when data are partitioned by a taxonomy based on uncertainty. Through synthetic and real-data experiments, the authors demonstrate that MCP provides more stable conditional coverage near the target level, while marginal methods may under- or over-cover in regions of higher uncertainty. The work advances practical uncertainty quantification by enabling adaptive, conditionally valid prediction sets in heteroskedastic regression and offers diagnostic tools to assess conditional performance.

Abstract

Conformal prediction, and split conformal prediction as a specific implementation, offer a distribution-free approach to estimating prediction intervals with statistical guarantees. Recent work has shown that split conformal prediction can produce state-of-the-art prediction intervals when focusing on marginal coverage, i.e. on a calibration dataset the method produces on average prediction intervals that contain the ground truth with a predefined coverage level. However, such intervals are often not adaptive, which can be problematic for regression problems with heteroskedastic noise. This paper tries to shed new light on how prediction intervals can be constructed, using methods such as normalized and Mondrian conformal prediction, in such a way that they adapt to the heteroskedasticity of the underlying process. Theoretical and experimental results are presented in which these methods are compared in a systematic way. In particular, it is shown how the conditional validity of a chosen conformal predictor can be related to (implicit) assumptions about the data-generating distribution.
Paper Structure (19 sections, 7 theorems, 63 equations, 16 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 7 theorems, 63 equations, 16 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let $\Gamma^\alpha:\mathcal{X}\times(\mathcal{X}\times\mathbb{R})^\infty\rightarrow[\mathbb{R}]$ be an inductive conformal predictor at significance level $\alpha\in[0,1]$. If the nonconformity scores are exchangeable for any calibration set $\mathcal{V}$ and any new observation $(\mathbf{x},y)$, i. where the probability is taken over both $(X,Y)$ and $V$. Moreover, if the nonconformity scores are

Figures (16)

  • Figure 1: Two data samples with the same trend but with different noise levels: $y(x,s)\sim0.1x+2s+\varepsilon(s)$, where $s\in\{0,1\}$ is a dummy variable labelling the subgroups. The blue subgroup ($s=0$) has standard deviation $0.1$, while the red subgroup ($s=1$) has standard deviation $0.5$. Although the prediction intervals are valid at the $\alpha=0.2$ significance level, both marginally and for the blue subgroup, this is not the case for the red subgroup.
  • Figure 2: Division of the feature space based on two different taxonomy functions. The first one simply thresholds the second dimension (at $\xi=0.2$), whereas the second one performs equal-frequency binning on the (conditional) standard deviation, which has the form shown in Eq. \ref{['problem_example_distribution']}.
  • Figure 3: Probability density function of the triangular distribution \ref{['triangle']} with (conditional) width parameter $\lambda(\mathbf{x})=5$.
  • Figure 4: Conditional coverage at significance level $\alpha=0.1$ for synthetic data sets of Types 1, 2 and 3. For every type, the data is divided in three folds based on equal-frequency binning of the estimated variance. The coloured columns indicate the type of misspecification (from left to right): oracle, additive noise on the standard deviation (means of 0.01, 0.1 and 1), scaling by factor 5 of the standard deviation and additive noise on the mean (means of 1 and $\widehat{\sigma}$). For every model, three nonconformity measures are shown (from left to right): residual, interval and $\widehat{\sigma}$-normalized nonconformity measure.
  • Figure 5: CDF plot of the (true) variance. The colors indicate the taxonomy classes with equal-frequency binning ($n=3$ classes).
  • ...and 11 more figures

Theorems & Definitions (21)

  • Definition 1: Validity
  • Example 1
  • Theorem 1: Marginal validity
  • Lemma 1
  • Example 2: Mean-variance estimators
  • Remark 1
  • Theorem 2: Conditional validity
  • Theorem 3: Independence
  • proof
  • Remark 2
  • ...and 11 more