Table of Contents
Fetching ...

Bernstein-von Mises for Adaptively Collected Data

Kevin Du, Yash Nair, Lucas Janson

TL;DR

This work extends Bernstein–von Mises theory to adaptively collected data, showing that Bayesian uncertainty quantification becomes asymptotically equivalent to Wald-type frequentist UQ in broad adaptive settings. By proving that the posterior for the parameter vector $\beta$ concentrates around the maximum likelihood estimator with covariance $\sigma^2(\mathbf{X}_n^\top\mathbf{X}_n)^{-1}$, the paper unifies Bayesian and frequentist views under adaptivity without relying on traditional stability conditions. The contributions cover a general adaptive linear Gaussian framework, specialized results for multi-armed and contextual bandits (including exponential-family extensions), and a triangular-array formulation with thorough numerical validation highlighting when Bayesian UQ aligns with or diverges from frequentist guarantees. The findings offer practical insights into the limits of Bayesian credible intervals in adaptive experiments and motivate further work on empirical Bayes approaches and broader parametric models. Overall, the results deepen our understanding of uncertainty quantification in sequential, adaptive data collection and their implications for decision-making in safety-critical and online settings.

Abstract

Uncertainty quantification (UQ) for adaptively collected data, such as that coming from adaptive experiments, bandits, or reinforcement learning, is necessary for critical elements of data collection such as ensuring safety and conducting after-study inference. The data's adaptivity creates significant challenges for frequentist UQ, yet Bayesian UQ remains the same as if the data were independent and identically distributed (i.i.d.), making it an appealing and commonly used approach. Bayesian UQ requires the (correct) specification of a prior distribution while frequentist UQ does not, but for i.i.d. data the celebrated Bernstein-von Mises theorem shows that as the sample size grows, the prior 'washes out' and Bayesian UQ becomes frequentist-valid, implying that the choice of prior need not be a major impediment to Bayesian UQ as it makes no difference asymptotically. This paper for the first time extends the Bernstein-von Mises theorem to adaptively collected data, proving asymptotic equivalence between Bayesian UQ and Wald-type frequentist UQ in this challenging setting. Our result showing this asymptotic agreement does not require the standard stability condition required by works studying validity of Wald-type frequentist UQ; in cases where stability is satisfied, our results combined with these prior studies of frequentist UQ imply frequentist validity of Bayesian UQ. Counterintuitively however, they also provide a negative result that Bayesian UQ is not asymptotically frequentist valid when stability fails, despite the fact that the prior washes out and Bayesian UQ asymptotically matches standard Wald-type frequentist UQ. We empirically validate our theory (positive and negative) via a range of simulations.

Bernstein-von Mises for Adaptively Collected Data

TL;DR

This work extends Bernstein–von Mises theory to adaptively collected data, showing that Bayesian uncertainty quantification becomes asymptotically equivalent to Wald-type frequentist UQ in broad adaptive settings. By proving that the posterior for the parameter vector concentrates around the maximum likelihood estimator with covariance , the paper unifies Bayesian and frequentist views under adaptivity without relying on traditional stability conditions. The contributions cover a general adaptive linear Gaussian framework, specialized results for multi-armed and contextual bandits (including exponential-family extensions), and a triangular-array formulation with thorough numerical validation highlighting when Bayesian UQ aligns with or diverges from frequentist guarantees. The findings offer practical insights into the limits of Bayesian credible intervals in adaptive experiments and motivate further work on empirical Bayes approaches and broader parametric models. Overall, the results deepen our understanding of uncertainty quantification in sequential, adaptive data collection and their implications for decision-making in safety-critical and online settings.

Abstract

Uncertainty quantification (UQ) for adaptively collected data, such as that coming from adaptive experiments, bandits, or reinforcement learning, is necessary for critical elements of data collection such as ensuring safety and conducting after-study inference. The data's adaptivity creates significant challenges for frequentist UQ, yet Bayesian UQ remains the same as if the data were independent and identically distributed (i.i.d.), making it an appealing and commonly used approach. Bayesian UQ requires the (correct) specification of a prior distribution while frequentist UQ does not, but for i.i.d. data the celebrated Bernstein-von Mises theorem shows that as the sample size grows, the prior 'washes out' and Bayesian UQ becomes frequentist-valid, implying that the choice of prior need not be a major impediment to Bayesian UQ as it makes no difference asymptotically. This paper for the first time extends the Bernstein-von Mises theorem to adaptively collected data, proving asymptotic equivalence between Bayesian UQ and Wald-type frequentist UQ in this challenging setting. Our result showing this asymptotic agreement does not require the standard stability condition required by works studying validity of Wald-type frequentist UQ; in cases where stability is satisfied, our results combined with these prior studies of frequentist UQ imply frequentist validity of Bayesian UQ. Counterintuitively however, they also provide a negative result that Bayesian UQ is not asymptotically frequentist valid when stability fails, despite the fact that the prior washes out and Bayesian UQ asymptotically matches standard Wald-type frequentist UQ. We empirically validate our theory (positive and negative) via a range of simulations.

Paper Structure

This paper contains 23 sections, 12 theorems, 54 equations, 6 figures, 2 algorithms.

Key Result

Theorem 1

Suppose $\pi(\beta)$ is the prior distribution for $\beta$ and $\pi(\beta | H_n)$ is the posterior after observing the trajectory $H_n$. Let $\hat{\beta}_n = (\mathbf{X}_n^\top \mathbf{X}_n)^{-1} \mathbf{X}_n^\top \mathbf{y}_n$ be the MLE for $\beta$ and let $\beta_0$ be the true value of $\beta$. A Then, the posterior distribution $\pi(\beta | H_n)$ satisfies

Figures (6)

  • Figure 1: (Left) Average TV distance measured in the BvM statement for UCB in two-arm Gaussian bandits over horizon $T=10^4$ using $10^4$ replicates under five different true parameter configurations labelled by $[\mu_1, \mu_2]$ where $\mu_1,\mu_2$ are the true means. (Right) Average TV distance measured in the BvM statement for lin-UCB on three-arm Gaussian linear contextual bandits with context distribution $\mathcal{N}(0, I_{2 \times 2})$ under three different true parameter configurations. Standard Gaussian priors are used for all arms. TV estimates shown have standard error at most $0.1$ times the TV estimate.
  • Figure 2: Average BvM TV distance for UCB on Bernoulli bandits and Poisson bandits, under the same configurations as Figure \ref{['fig:experiments1']}. $\text{Beta}(1, 1)$ priors are used for the Bernoulli bandit and $\text{Gamma}(1, 1)$ priors for the Poisson bandit. The representative normal is centered at the true MLE, which is asymptotically equivalent to the local MLE used in Theorem \ref{['thm:bvm_exp']}. TV estimates shown have standard error at most $0.1$ times the TV estimate.
  • Figure 3: (Left) Average BvM TV distance and empirical coverage of the $95\%$ credible interval for the margin for Thompson Sampling in the two-batch two-arm Gaussian bandit setting with $10^4$ samples per batch. Error bars are $95\%$ confidence intervals over $2 \times 10^5$ replicates. Blacked dotted line is the correct coverage level. $\mathcal{N}(0, 1)$ priors are used. (Right) Average TV distance for Noisy Certainty Equivalent Control on LQR wang2021exact under three different parameter configurations. Standard Gaussian priors are used for all arms. TV estimates shown have standard error at most $0.2$ times the TV estimate.
  • Figure 4: BvM TV distance for the Zozo dataset with prior $\text{Beta}(1, 1)$. TV estimates computed with $10^4$ samples and have standard error at most $0.004$.
  • Figure 5: Frequentist coverage of the Bayesian credible interval for the contextual bandit and LQR, under the same configurations as Section \ref{['sec:simulations']}. Coverage estimates shown have standard error at most $0.004$.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Theorem 1
  • Lemma 1
  • Theorem 2
  • Remark 1
  • Corollary 1
  • Theorem 3
  • Theorem 4
  • Proposition 1
  • proof
  • Lemma 2
  • ...and 9 more