Deep Bootstrap
Jinyuan Chang, Yuling Jiao, Lican Kang, Junjie Shi
TL;DR
The paper tackles uncertainty quantification in nonparametric regression by marrying conditional diffusion modeling with bootstrap resampling. It learns the conditional distribution $P_{\mathbf{Y}|\mathbf{X}}$ with a variance-preserving diffusion model, uses generated samples to form a regression estimator $\widehat{f}$, and then constructs bootstrap replicas to obtain $\widehat{f}^*$ and confidence intervals. The authors derive sharp end-to-end convergence rates in Wasserstein distance for the conditional diffusion model and establish bootstrap consistency and coverage guarantees, supported by rigorous proofs and numerical experiments. Empirically, the method demonstrates accurate interval coverage and scalability to higher-dimensional covariates, validating the practical utility of integrating diffusion-based generative modeling into bootstrap inference. The framework offers a principled, unified approach to both sampling from complex conditional distributions and nonparametric estimation, with potential extensions to time series and broader nonparametric settings.
Abstract
In this work, we propose a novel deep bootstrap framework for nonparametric regression based on conditional diffusion models. Specifically, we construct a conditional diffusion model to learn the distribution of the response variable given the covariates. This model is then used to generate bootstrap samples by pairing the original covariates with newly synthesized responses. We reformulate nonparametric regression as conditional sample mean estimation, which is implemented directly via the learned conditional diffusion model. Unlike traditional bootstrap methods that decouple the estimation of the conditional distribution, sampling, and nonparametric regression, our approach integrates these components into a unified generative framework. With the expressive capacity of diffusion models, our method facilitates both efficient sampling from high-dimensional or multimodal distributions and accurate nonparametric estimation. We establish rigorous theoretical guarantees for the proposed method. In particular, we derive optimal end-to-end convergence rates in the Wasserstein distance between the learned and target conditional distributions. Building on this foundation, we further establish the convergence guarantees of the resulting bootstrap procedure. Numerical studies demonstrate the effectiveness and scalability of our approach for complex regression tasks.
