Table of Contents
Fetching ...

Deep Bootstrap

Jinyuan Chang, Yuling Jiao, Lican Kang, Junjie Shi

TL;DR

The paper tackles uncertainty quantification in nonparametric regression by marrying conditional diffusion modeling with bootstrap resampling. It learns the conditional distribution $P_{\mathbf{Y}|\mathbf{X}}$ with a variance-preserving diffusion model, uses generated samples to form a regression estimator $\widehat{f}$, and then constructs bootstrap replicas to obtain $\widehat{f}^*$ and confidence intervals. The authors derive sharp end-to-end convergence rates in Wasserstein distance for the conditional diffusion model and establish bootstrap consistency and coverage guarantees, supported by rigorous proofs and numerical experiments. Empirically, the method demonstrates accurate interval coverage and scalability to higher-dimensional covariates, validating the practical utility of integrating diffusion-based generative modeling into bootstrap inference. The framework offers a principled, unified approach to both sampling from complex conditional distributions and nonparametric estimation, with potential extensions to time series and broader nonparametric settings.

Abstract

In this work, we propose a novel deep bootstrap framework for nonparametric regression based on conditional diffusion models. Specifically, we construct a conditional diffusion model to learn the distribution of the response variable given the covariates. This model is then used to generate bootstrap samples by pairing the original covariates with newly synthesized responses. We reformulate nonparametric regression as conditional sample mean estimation, which is implemented directly via the learned conditional diffusion model. Unlike traditional bootstrap methods that decouple the estimation of the conditional distribution, sampling, and nonparametric regression, our approach integrates these components into a unified generative framework. With the expressive capacity of diffusion models, our method facilitates both efficient sampling from high-dimensional or multimodal distributions and accurate nonparametric estimation. We establish rigorous theoretical guarantees for the proposed method. In particular, we derive optimal end-to-end convergence rates in the Wasserstein distance between the learned and target conditional distributions. Building on this foundation, we further establish the convergence guarantees of the resulting bootstrap procedure. Numerical studies demonstrate the effectiveness and scalability of our approach for complex regression tasks.

Deep Bootstrap

TL;DR

The paper tackles uncertainty quantification in nonparametric regression by marrying conditional diffusion modeling with bootstrap resampling. It learns the conditional distribution with a variance-preserving diffusion model, uses generated samples to form a regression estimator , and then constructs bootstrap replicas to obtain and confidence intervals. The authors derive sharp end-to-end convergence rates in Wasserstein distance for the conditional diffusion model and establish bootstrap consistency and coverage guarantees, supported by rigorous proofs and numerical experiments. Empirically, the method demonstrates accurate interval coverage and scalability to higher-dimensional covariates, validating the practical utility of integrating diffusion-based generative modeling into bootstrap inference. The framework offers a principled, unified approach to both sampling from complex conditional distributions and nonparametric estimation, with potential extensions to time series and broader nonparametric settings.

Abstract

In this work, we propose a novel deep bootstrap framework for nonparametric regression based on conditional diffusion models. Specifically, we construct a conditional diffusion model to learn the distribution of the response variable given the covariates. This model is then used to generate bootstrap samples by pairing the original covariates with newly synthesized responses. We reformulate nonparametric regression as conditional sample mean estimation, which is implemented directly via the learned conditional diffusion model. Unlike traditional bootstrap methods that decouple the estimation of the conditional distribution, sampling, and nonparametric regression, our approach integrates these components into a unified generative framework. With the expressive capacity of diffusion models, our method facilitates both efficient sampling from high-dimensional or multimodal distributions and accurate nonparametric estimation. We establish rigorous theoretical guarantees for the proposed method. In particular, we derive optimal end-to-end convergence rates in the Wasserstein distance between the learned and target conditional distributions. Building on this foundation, we further establish the convergence guarantees of the resulting bootstrap procedure. Numerical studies demonstrate the effectiveness and scalability of our approach for complex regression tasks.
Paper Structure (38 sections, 30 theorems, 427 equations, 4 tables, 2 algorithms)

This paper contains 38 sections, 30 theorems, 427 equations, 4 tables, 2 algorithms.

Key Result

Lemma 3.1

Suppose that Assumptions ass: bounded_density-ass: bounded_derivative hold. Let $\mathcal{M} \gg 1, C_T > 0$, and $T = \mathcal{M}^{-C_T}$. Then we can choose a ReLU neural network $\mathbf{b}\in\mathrm{NN}(L,M,J,\kappa)$ that satisfies for a constant $C_0 = \mathcal{O}(\sqrt{\log\mathcal{M}})$, and and has the following structure: Moreover, for any $t\in[\mathcal{M}^{-C_T}, 1 - \mathcal{M}^{-C

Theorems & Definitions (58)

  • Definition 1.1: ReLU DNNs
  • Definition 1.2: Wasserstein distance
  • Definition 1.3: Covering number
  • Definition 1.4: ($\beta$, $R$)-Hölder Class
  • Remark 3.1
  • Lemma 3.1: Approximation Error
  • Remark 3.2
  • Lemma 3.2: Statistical Error
  • Theorem 3.3: Error Bound for Conditional Score Estimation
  • Theorem 3.4
  • ...and 48 more