Online Bernstein-von Mises theorem
Jeyong Lee, Junhyeok Choi, Minwoo Chae
TL;DR
This work rigorously analyzes Bayesian online learning with sequential mini-batches, proposing a variational Gaussian projection to enable tractable updates. Under mild regularity and self-concordance-type smoothness assumptions, it establishes nonasymptotic bounds showing the online variational posterior is asymptotically indistinguishable from the full posterior, via online Bernstein–von–Mises theorems. The paper develops key tools including Laplace approximations, penalized M-estimation with informative priors, and precise control of remainder terms, yielding rates that depend on the effective dimension $p_*$ and batch size $n$, with $p_* = p\vee \log n \vee \log T$. A concrete logistic-regression example and numerical experiments illustrate the theory, demonstrating that internet-style online Bayesian updates can yield valid uncertainty quantification and near-batch performance when mini-batches are sufficiently large, while highlighting potential improvements for very small $n$ or higher dimensions. Overall, the results provide a principled justification for online Bayesian inference with sequential data and quantify how prior information enhances the effective sample size, enabling reliable inference in streaming settings.
Abstract
Online learning is an inferential paradigm in which parameters are updated incrementally from sequentially available data, in contrast to batch learning, where the entire dataset is processed at once. In this paper, we assume that mini-batches from the full dataset become available sequentially. The Bayesian framework, which updates beliefs about unknown parameters after observing each mini-batch, is naturally suited for online learning. At each step, we update the posterior distribution using the current prior and new observations, with the updated posterior serving as the prior for the next step. However, this recursive Bayesian updating is rarely computationally tractable unless the model and prior are conjugate. When the model is regular, the updated posterior can be approximated by a normal distribution, as justified by the Bernstein-von Mises theorem. We adopt a variational approximation at each step and investigate the frequentist properties of the final posterior obtained through this sequential procedure. Under mild assumptions, we show that the accumulated approximation error becomes negligible once the mini-batch size exceeds a threshold depending on the parameter dimension. As a result, the sequentially updated posterior is asymptotically indistinguishable from the full posterior.
