Generative modeling for the bootstrap
Leon Tran, Ting Ye, Peng Ding, Fang Han
TL;DR
Generative modeling-based bootstrap learns a generator ${\widehat{\bm{G}}_n}$ to map noise ${\bm{U}}$ to synthetic data that approximate the unknown distribution ${\mathrm P}_Z$, unifying classical, parametric, and smoothed bootstrap concepts. The authors prove bootstrap consistency for regular M-estimators and for irregular estimators such as isotonic regression under broad data, noise, and generator assumptions, establishing that conditional bootstrap distributions converge to the same limits as the original estimators. Concrete instantiations via GANs (e.g., W-GAN) and flow-based models (affine autoregressive flows) are shown to satisfy the framework’s assumptions, with flow bootstraps offering stronger guarantees for irregular problems. Simulation results compare original, smoothed, GAN, and flow bootstraps on OLS and isotonic regression, demonstrating that GAN and flow bootstraps can match or exceed the original bootstrap’s performance and are more robust to high dimensionality than kernel-based smoothing, highlighting practical applicability in challenging inferential settings.
Abstract
Generative modeling builds on and substantially advances the classical idea of simulating synthetic data from observed samples. This paper shows that this principle is not only natural but also theoretically well-founded for bootstrap inference: it yields statistically valid confidence intervals that apply simultaneously to both regular and irregular estimators, including settings in which Efron's bootstrap fails. In this sense, the generative modeling-based bootstrap can be viewed as a modern version of the smoothed bootstrap: it could mitigate the curse of dimensionality and remain effective in challenging regimes where estimators may lack root-$n$ consistency or a Gaussian limit.
