A Modern Theory of Cross-Validation through the Lens of Stability
Jing Lei
TL;DR
This work develops a stability-centered theory of cross-validation (CV) for uncertainty quantification in modern data settings. It establishes risk-consistency and concentration results for CV under various stability notions, including LOOCV and sub-Weibull tails, and extends to online and rolling validation schemes. The paper develops central limit theorems for CV risk with both random and deterministic centering, and advances high-dimensional Gaussian comparisons to enable simultaneous inference across many CV-based risk estimates. It then builds practical tools, such as model confidence sets, difference-based CV inferences, and cross-conformal prediction, and connects these to applications like consistent subset selection and testing many means. Together, these results provide a unified framework for robust predictive inference and model selection under data complexity and black-box estimation, with concrete guidance for variance estimation and inference in high-dimensional settings.
Abstract
Modern data analysis and statistical learning are marked by complex data structures and black-box algorithms. Data complexity stems from technologies such as imaging, remote sensing, wearable devices, and genomic sequencing. At the same time, black-box models, especially deep neural networks, have achieved impressive results. This combination raises new challenges for uncertainty quantification and statistical inference, which we refer to as ``black-box inference.'' Black-box inference is difficult due to the lack of traditional modeling assumptions and the opaque behavior of modern estimators. These factors make it hard to characterize the distribution of estimation errors. A popular solution is post-hoc randomization, which, under mild assumptions such as exchangeability, can yield valid uncertainty quantification. Such methods range from classical techniques like permutation tests, the jackknife, and the bootstrap to more recent innovations like conformal inference. These approaches typically require little knowledge of data distributions or the internal workings of estimators. Many rely on the idea that estimators behave similarly under small perturbations of the data -- a concept formalized as stability. Over time, stability has become a key principle in data science, influencing research on generalization error, privacy, and adaptive inference. This article investigates cross-validation (CV) -- a widely used resampling method -- through the lens of stability. We first review recent theoretical results on CV for estimating generalization error and model selection under stability assumptions. We then examine uncertainty quantification for CV-based risk estimates. Together, these insights yield new theory and tools, which we apply to topics including model selection, selective inference, and conformal prediction.
