Table of Contents
Fetching ...

Flexible Conformal Highest Predictive Conditional Density Sets

Max Sampson, Kung-Sik Chan

TL;DR

CHCDS introduces a conformal prediction framework for conditional highest density sets that uses an estimated conditional density and a simple conformal adjustment to achieve unconditional coverage without data partitioning. The method computes nonconformity scores $V_i = \hat{f}(Y_i\mid{\boldsymbol{X}}_i) - \hat{c}({\boldsymbol{X}}_i)$ and adjusts the HD cutoff with $\hat{q}$ to form $\hat{C}({\boldsymbol{x}}) = \{y: \hat{f}(y\mid{\boldsymbol{x}}) > \hat{c}({\boldsymbol{x}}) + \hat{q}\}$, guaranteeing $P(Y_{n+1}\in\hat{C}({\boldsymbol{X}}_{n+1}))\ge 1-\alpha$ and, under correct density specification, asymptotically preserving conditional coverage. It also remains valid when the conditional density estimator is misspecified, thanks to the conformal adjustment; theoretical results show the adjustment vanishes with larger training/calibration samples. Empirical studies, including simulations with multi-modal errors and a real galaxy redshift dataset, demonstrate CHCDS outperforms partition-based approaches in conditional coverage and yields smaller sets, while maintaining computational efficiency. The work provides a practical, flexible tool for reliable predictive regions in complex, multi-modal distributions.

Abstract

We introduce our method, conformal highest conditional density sets (CHCDS), that forms conformal prediction sets using existing estimated conditional highest density predictive regions. We prove the validity of the method, and that conformal adjustment is negligible under some regularity conditions. In particular, if we correctly specify the underlying conditional density estimator, the conformal adjustment will be negligible. The conformal adjustment, however, always provides guaranteed nominal unconditional coverage, even when the underlying model is incorrectly specified. We compare the proposed method via simulation and a real data analysis to other existing methods. Our numerical results show that CHCDS is better than existing methods in scenarios where the error term is multi-modal, and just as good as existing methods when the error terms are unimodal.

Flexible Conformal Highest Predictive Conditional Density Sets

TL;DR

CHCDS introduces a conformal prediction framework for conditional highest density sets that uses an estimated conditional density and a simple conformal adjustment to achieve unconditional coverage without data partitioning. The method computes nonconformity scores and adjusts the HD cutoff with to form , guaranteeing and, under correct density specification, asymptotically preserving conditional coverage. It also remains valid when the conditional density estimator is misspecified, thanks to the conformal adjustment; theoretical results show the adjustment vanishes with larger training/calibration samples. Empirical studies, including simulations with multi-modal errors and a real galaxy redshift dataset, demonstrate CHCDS outperforms partition-based approaches in conditional coverage and yields smaller sets, while maintaining computational efficiency. The work provides a practical, flexible tool for reliable predictive regions in complex, multi-modal distributions.

Abstract

We introduce our method, conformal highest conditional density sets (CHCDS), that forms conformal prediction sets using existing estimated conditional highest density predictive regions. We prove the validity of the method, and that conformal adjustment is negligible under some regularity conditions. In particular, if we correctly specify the underlying conditional density estimator, the conformal adjustment will be negligible. The conformal adjustment, however, always provides guaranteed nominal unconditional coverage, even when the underlying model is incorrectly specified. We compare the proposed method via simulation and a real data analysis to other existing methods. Our numerical results show that CHCDS is better than existing methods in scenarios where the error term is multi-modal, and just as good as existing methods when the error terms are unimodal.

Paper Structure

This paper contains 21 sections, 7 theorems, 64 equations, 30 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

If $(Y_i, {\boldsymbol{X}}_i), i = 1, \ldots, n$ are exchangeable, then the prediction interval $\hat{C}({\boldsymbol{X}}_{n+1})$ constructed by CHCDS satisfies If the $V_i$'s are almost surely distinct, then

Figures (30)

  • Figure 1: A diagram showing the comparison of conditional coverage in the mixture scenario. Left: unadjusted FlexCode (black squares), HPD-split (red diamonds), DCP (green triangles). Right CHCDS (Gaussian Mix) (purple X), CHCDS (KNN) (blue circles), CQR (orange nablas), and CHR (black circles). The dashed line represents the desired 90% coverage. The other lines represent the conditional coverage at a given value of $X$.
  • Figure S1: The HPD-split score for a sample $(y_i, {\boldsymbol{x}}_i)$ is the shaded region of the plot.
  • Figure S2: The CHCDS score for a sample $(y_i, {\boldsymbol{x}}_i)$ is shown by the arrow.
  • Figure S3: A plot showing the conditional coverage of CHCDS (solid line) and the negative density score (open circles) at a given value of X. The dashed line represents the desired 90% coverage.
  • Figure S4: Comparison of conditional coverage between CHCDS-subtraction Kernel (blue) and CHCDS-division Kernel (black) in the mixture scenario. The dashed line shows the desired 99% coverage.
  • ...and 25 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • proof
  • proof
  • ...and 2 more