Table of Contents
Fetching ...

Generative modeling for the bootstrap

Leon Tran, Ting Ye, Peng Ding, Fang Han

TL;DR

Generative modeling-based bootstrap learns a generator ${\widehat{\bm{G}}_n}$ to map noise ${\bm{U}}$ to synthetic data that approximate the unknown distribution ${\mathrm P}_Z$, unifying classical, parametric, and smoothed bootstrap concepts. The authors prove bootstrap consistency for regular M-estimators and for irregular estimators such as isotonic regression under broad data, noise, and generator assumptions, establishing that conditional bootstrap distributions converge to the same limits as the original estimators. Concrete instantiations via GANs (e.g., W-GAN) and flow-based models (affine autoregressive flows) are shown to satisfy the framework’s assumptions, with flow bootstraps offering stronger guarantees for irregular problems. Simulation results compare original, smoothed, GAN, and flow bootstraps on OLS and isotonic regression, demonstrating that GAN and flow bootstraps can match or exceed the original bootstrap’s performance and are more robust to high dimensionality than kernel-based smoothing, highlighting practical applicability in challenging inferential settings.

Abstract

Generative modeling builds on and substantially advances the classical idea of simulating synthetic data from observed samples. This paper shows that this principle is not only natural but also theoretically well-founded for bootstrap inference: it yields statistically valid confidence intervals that apply simultaneously to both regular and irregular estimators, including settings in which Efron's bootstrap fails. In this sense, the generative modeling-based bootstrap can be viewed as a modern version of the smoothed bootstrap: it could mitigate the curse of dimensionality and remain effective in challenging regimes where estimators may lack root-$n$ consistency or a Gaussian limit.

Generative modeling for the bootstrap

TL;DR

Generative modeling-based bootstrap learns a generator to map noise to synthetic data that approximate the unknown distribution , unifying classical, parametric, and smoothed bootstrap concepts. The authors prove bootstrap consistency for regular M-estimators and for irregular estimators such as isotonic regression under broad data, noise, and generator assumptions, establishing that conditional bootstrap distributions converge to the same limits as the original estimators. Concrete instantiations via GANs (e.g., W-GAN) and flow-based models (affine autoregressive flows) are shown to satisfy the framework’s assumptions, with flow bootstraps offering stronger guarantees for irregular problems. Simulation results compare original, smoothed, GAN, and flow bootstraps on OLS and isotonic regression, demonstrating that GAN and flow bootstraps can match or exceed the original bootstrap’s performance and are more robust to high dimensionality than kernel-based smoothing, highlighting practical applicability in challenging inferential settings.

Abstract

Generative modeling builds on and substantially advances the classical idea of simulating synthetic data from observed samples. This paper shows that this principle is not only natural but also theoretically well-founded for bootstrap inference: it yields statistically valid confidence intervals that apply simultaneously to both regular and irregular estimators, including settings in which Efron's bootstrap fails. In this sense, the generative modeling-based bootstrap can be viewed as a modern version of the smoothed bootstrap: it could mitigate the curse of dimensionality and remain effective in challenging regimes where estimators may lack root- consistency or a Gaussian limit.
Paper Structure (27 sections, 39 theorems, 296 equations, 2 tables)

This paper contains 27 sections, 39 theorems, 296 equations, 2 tables.

Key Result

Theorem 3.1

Under Assumptions assump_ndata-assump_bsmest, we have

Theorems & Definitions (62)

  • Definition 2.1: Neural networks
  • Example 2.1: Wasserstein GAN-based generative models, arjovsky2017wasserstein
  • Definition 2.2: Bijective monotone upper triangular functions
  • Definition 2.3: Affine autoregressive flows
  • Example 2.2: Affine autoregressive flow-based generative models
  • Theorem 3.1: Bootstrap consistency, regular M-estimators
  • Remark 3.1
  • Theorem 4.1: Bootstrap consistency, isotonic regression
  • Theorem 5.1: GAN bootstrap
  • Theorem 5.2: Flow bootstrap
  • ...and 52 more