Table of Contents
Fetching ...

Online Bernstein-von Mises theorem

Jeyong Lee, Junhyeok Choi, Minwoo Chae

TL;DR

This work rigorously analyzes Bayesian online learning with sequential mini-batches, proposing a variational Gaussian projection to enable tractable updates. Under mild regularity and self-concordance-type smoothness assumptions, it establishes nonasymptotic bounds showing the online variational posterior is asymptotically indistinguishable from the full posterior, via online Bernstein–von–Mises theorems. The paper develops key tools including Laplace approximations, penalized M-estimation with informative priors, and precise control of remainder terms, yielding rates that depend on the effective dimension $p_*$ and batch size $n$, with $p_* = p\vee \log n \vee \log T$. A concrete logistic-regression example and numerical experiments illustrate the theory, demonstrating that internet-style online Bayesian updates can yield valid uncertainty quantification and near-batch performance when mini-batches are sufficiently large, while highlighting potential improvements for very small $n$ or higher dimensions. Overall, the results provide a principled justification for online Bayesian inference with sequential data and quantify how prior information enhances the effective sample size, enabling reliable inference in streaming settings.

Abstract

Online learning is an inferential paradigm in which parameters are updated incrementally from sequentially available data, in contrast to batch learning, where the entire dataset is processed at once. In this paper, we assume that mini-batches from the full dataset become available sequentially. The Bayesian framework, which updates beliefs about unknown parameters after observing each mini-batch, is naturally suited for online learning. At each step, we update the posterior distribution using the current prior and new observations, with the updated posterior serving as the prior for the next step. However, this recursive Bayesian updating is rarely computationally tractable unless the model and prior are conjugate. When the model is regular, the updated posterior can be approximated by a normal distribution, as justified by the Bernstein-von Mises theorem. We adopt a variational approximation at each step and investigate the frequentist properties of the final posterior obtained through this sequential procedure. Under mild assumptions, we show that the accumulated approximation error becomes negligible once the mini-batch size exceeds a threshold depending on the parameter dimension. As a result, the sequentially updated posterior is asymptotically indistinguishable from the full posterior.

Online Bernstein-von Mises theorem

TL;DR

This work rigorously analyzes Bayesian online learning with sequential mini-batches, proposing a variational Gaussian projection to enable tractable updates. Under mild regularity and self-concordance-type smoothness assumptions, it establishes nonasymptotic bounds showing the online variational posterior is asymptotically indistinguishable from the full posterior, via online Bernstein–von–Mises theorems. The paper develops key tools including Laplace approximations, penalized M-estimation with informative priors, and precise control of remainder terms, yielding rates that depend on the effective dimension and batch size , with . A concrete logistic-regression example and numerical experiments illustrate the theory, demonstrating that internet-style online Bayesian updates can yield valid uncertainty quantification and near-batch performance when mini-batches are sufficiently large, while highlighting potential improvements for very small or higher dimensions. Overall, the results provide a principled justification for online Bayesian inference with sequential data and quantify how prior information enhances the effective sample size, enabling reliable inference in streaming settings.

Abstract

Online learning is an inferential paradigm in which parameters are updated incrementally from sequentially available data, in contrast to batch learning, where the entire dataset is processed at once. In this paper, we assume that mini-batches from the full dataset become available sequentially. The Bayesian framework, which updates beliefs about unknown parameters after observing each mini-batch, is naturally suited for online learning. At each step, we update the posterior distribution using the current prior and new observations, with the updated posterior serving as the prior for the next step. However, this recursive Bayesian updating is rarely computationally tractable unless the model and prior are conjugate. When the model is regular, the updated posterior can be approximated by a normal distribution, as justified by the Bernstein-von Mises theorem. We adopt a variational approximation at each step and investigate the frequentist properties of the final posterior obtained through this sequential procedure. Under mild assumptions, we show that the accumulated approximation error becomes negligible once the mini-batch size exceeds a threshold depending on the parameter dimension. As a result, the sequentially updated posterior is asymptotically indistinguishable from the full posterior.

Paper Structure

This paper contains 33 sections, 51 theorems, 651 equations, 1 figure, 3 tables.

Key Result

Theorem 3.1

Suppose that (A0) holds. Then, with probability 1, we have where and $K > 0$ is a universal constant.

Figures (1)

  • Figure 1: Relative efficiency in terms of MSE across varying mini-batch size.

Theorems & Definitions (102)

  • Theorem 3.1: Laplace approximation: TV distance
  • Theorem 3.2: Laplace approximation: KL divergence
  • Theorem 3.3: Variational approximation
  • Theorem 4.1
  • Proposition 5.1
  • proof
  • Theorem 6.1
  • Theorem 6.2
  • Proposition 7.1
  • Proposition 7.2
  • ...and 92 more