Uncertainty quantification for intervals
Carlos García Meixide, Michael R. Kosorok, Marcos Matabuena
TL;DR
This work addresses uncertainty quantification for interval-censored outcomes by introducing uncervals, a framework that blends conformal inference with bootstrap to construct predictive regions for interval targets. The authors develop a novel interval-process theory with a Donsker-class guarantee, unbiasedness, and bootstrap validity, yielding finite-sample calibration and asymptotic coverage results. Simulations show substantial improvements in conditional coverage (up to around 60%) and demonstrate robustness across scenarios, while real-data applications (sleep, age, and physical activity) illustrate practical gains for interval data in healthcare. The approach provides a general, model-compatible tool for reliable interval-target predictions and goodness-of-fit assessments in settings where exact event times are unknown, with potential extensions to multivariate and truncated data. Overall, uncervals advances uncertainty quantification for interval-censored data, offering theoretical guarantees and actionable insights for precision medicine and digital health.
Abstract
Data following an interval structure are increasingly prevalent in many scientific applications. In medicine, clinical events are often monitored between two clinical visits, making the exact time of the event unknown and generating outcomes with a range format. As interest in automating healthcare decisions grows, uncertainty quantification via predictive regions becomes essential for developing reliable and trustworthy predictive algorithms. However, the statistical literature currently lacks a general methodology for interval targets, especially when these outcomes are incomplete due to censoring. We propose an uncertainty quantification algorithm for interval responses and establish its theoretical properties using empirical process arguments based on a newly developed class of functions specifically designed for these interval data structures. Although this paper primarily focuses on deriving predictive regions for interval-censored data, the approach can also be applied to other statistical modeling tasks, such as goodness-of-fit assessments. Finally, the applicability of the method is demonstrated through simulations, showing up to a 60\% improvement in conditional coverage. Our new algorithm is also applied to various biomedical contexts, including two clinical examples: i) sleep duration and its association with cardiovascular diseases, and ii) survival time in relation to physical activity levels.
