Flexible Conformal Highest Predictive Conditional Density Sets
Max Sampson, Kung-Sik Chan
TL;DR
CHCDS introduces a conformal prediction framework for conditional highest density sets that uses an estimated conditional density and a simple conformal adjustment to achieve unconditional coverage without data partitioning. The method computes nonconformity scores $V_i = \hat{f}(Y_i\mid{\boldsymbol{X}}_i) - \hat{c}({\boldsymbol{X}}_i)$ and adjusts the HD cutoff with $\hat{q}$ to form $\hat{C}({\boldsymbol{x}}) = \{y: \hat{f}(y\mid{\boldsymbol{x}}) > \hat{c}({\boldsymbol{x}}) + \hat{q}\}$, guaranteeing $P(Y_{n+1}\in\hat{C}({\boldsymbol{X}}_{n+1}))\ge 1-\alpha$ and, under correct density specification, asymptotically preserving conditional coverage. It also remains valid when the conditional density estimator is misspecified, thanks to the conformal adjustment; theoretical results show the adjustment vanishes with larger training/calibration samples. Empirical studies, including simulations with multi-modal errors and a real galaxy redshift dataset, demonstrate CHCDS outperforms partition-based approaches in conditional coverage and yields smaller sets, while maintaining computational efficiency. The work provides a practical, flexible tool for reliable predictive regions in complex, multi-modal distributions.
Abstract
We introduce our method, conformal highest conditional density sets (CHCDS), that forms conformal prediction sets using existing estimated conditional highest density predictive regions. We prove the validity of the method, and that conformal adjustment is negligible under some regularity conditions. In particular, if we correctly specify the underlying conditional density estimator, the conformal adjustment will be negligible. The conformal adjustment, however, always provides guaranteed nominal unconditional coverage, even when the underlying model is incorrectly specified. We compare the proposed method via simulation and a real data analysis to other existing methods. Our numerical results show that CHCDS is better than existing methods in scenarios where the error term is multi-modal, and just as good as existing methods when the error terms are unimodal.
