An Online Bootstrap for Time Series
Nicolai Palm, Thomas Nagler
TL;DR
An Online Bootstrap for Time Series develops an online bootstrap that preserves dependence in streaming data by using autoregressive resampling weights. The method forms $X_i^*=\frac{V_i}{\overline{V}_n}X_i$ with $V_i$ following an autoregressive rule $V_i=1+\rho_i(V_{i-1}-1)+\sqrt{1-\rho_i^2}\zeta_i$ and $\rho_i=1-i^{-\beta}$, enabling cheap online updates and consistent uncertainty quantification. The authors establish asymptotic validity under stationarity and $\alpha$-mixing, identify the optimal $\beta_{opt}=\sqrt{2}-1$, and show a convergence rate of $\mathcal{O}(n^{-{\beta}/{(1+\beta)}})$ for variance estimation. Through simulations on iid, MA, nonlinear, and GARCH-type processes, the AR-bootstrap demonstrates reliable coverage and competitive computation time versus block-based methods, while also extending to transformed statistics and multivariate settings via the delta method. The framework offers a practical tool for online uncertainty quantification in ML tasks such as empirical risk minimization and bandit algorithms, bridging classical resampling with modern streaming data needs.
Abstract
Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.
