Table of Contents
Fetching ...

Exact Sampling of Gibbs Measures with Estimated Losses

David T. Frazier, Jeremias Knoblauch, Jack Jewson, Christopher Drovandi

TL;DR

This work analyzes Gibbs-bayesian posteriors built from losses that are often intractable and must be estimated via simulation. It shows that naive pseudo-marginal MCMC approaches incur a prohibitive dependence on the number of simulated losses, unless the simulation count grows with the data size. The authors prove a precise scaling result: to recover standard posterior concentration, the number of simulations must grow as $m(n) \asymp n^{1/\kappa}$, which is often impractical. To overcome this, they develop a modified zig-zag sampler (a PDMP) that draws samples from the true Gibbs measure with unbiased gradient estimators, achieving linear-in-$n$ complexity and independence from $m$, and demonstrate this approach across β-divergence and MMD examples in copula, regression, and Poisson settings. The work provides both theoretical guarantees and practical algorithms that substantially improve inference when losses are estimated via simulation.

Abstract

In recent years, the shortcomings of Bayesian posteriors as inferential devices have received increased attention. A popular strategy for fixing them has been to instead target a Gibbs measure based on losses that connect a parameter of interest to observed data. However, existing theory for such inference procedures assumes these losses are analytically available, while in many situations these losses must be stochastically estimated using pseudo-observations. In such cases, we show that when standard Markov Chain Monte Carlo algorithms are used to produce posterior samples, the resulting posterior exhibits strong dependence on the number of pseudo-observations: unless the number of pseudo-observations diverge sufficiently fast the resulting posterior will concentrate very slowly. However, we show that in many situations it is feasible to alleviate this dependence entirely using a modified piecewise deterministic Markov process (PDMP) sampler, and we formally and empirically show that these samplers produce posterior draws that have no dependence on the number of pseudo-observations used to estimate the loss within the Gibbs Measure. We apply our results to three examples that feature intractable likelihoods and model misspecification.

Exact Sampling of Gibbs Measures with Estimated Losses

TL;DR

This work analyzes Gibbs-bayesian posteriors built from losses that are often intractable and must be estimated via simulation. It shows that naive pseudo-marginal MCMC approaches incur a prohibitive dependence on the number of simulated losses, unless the simulation count grows with the data size. The authors prove a precise scaling result: to recover standard posterior concentration, the number of simulations must grow as , which is often impractical. To overcome this, they develop a modified zig-zag sampler (a PDMP) that draws samples from the true Gibbs measure with unbiased gradient estimators, achieving linear-in- complexity and independence from , and demonstrate this approach across β-divergence and MMD examples in copula, regression, and Poisson settings. The work provides both theoretical guarantees and practical algorithms that substantially improve inference when losses are estimated via simulation.

Abstract

In recent years, the shortcomings of Bayesian posteriors as inferential devices have received increased attention. A popular strategy for fixing them has been to instead target a Gibbs measure based on losses that connect a parameter of interest to observed data. However, existing theory for such inference procedures assumes these losses are analytically available, while in many situations these losses must be stochastically estimated using pseudo-observations. In such cases, we show that when standard Markov Chain Monte Carlo algorithms are used to produce posterior samples, the resulting posterior exhibits strong dependence on the number of pseudo-observations: unless the number of pseudo-observations diverge sufficiently fast the resulting posterior will concentrate very slowly. However, we show that in many situations it is feasible to alleviate this dependence entirely using a modified piecewise deterministic Markov process (PDMP) sampler, and we formally and empirically show that these samplers produce posterior draws that have no dependence on the number of pseudo-observations used to estimate the loss within the Gibbs Measure. We apply our results to three examples that feature intractable likelihoods and model misspecification.
Paper Structure (45 sections, 16 theorems, 71 equations, 10 figures, 3 algorithms)

This paper contains 45 sections, 16 theorems, 71 equations, 10 figures, 3 algorithms.

Key Result

Theorem 1

Suppose Assumptions ass:limit_fun-ass:tails-new hold and $m\asymp n^{1/\kappa}$. Then, for $\epsilon_n\rightarrow0$ with $n\epsilon_n\rightarrow\infty$, $M_n>0$ large enough, possibly $M_n\rightarrow\infty$ as $n\rightarrow\infty$, and for $\overline\Pi(A\mid\mathsf{L}_{m,n}):=\int_{A}\overline{\pi}

Figures (10)

  • Figure 1: $\operatorname{MMD}$-Bayes posterior density for $\rho$ in the Gaussian copula model. Throughout all panels, ZZ Gold should be thought of corresponding to $\pi(\theta\mid\mathsf{L}_n)$, and was obtained using a long run of zig-zag (ZZ) sampling. For $n=100$, the top left panel displays bP-MCMC (PM) posterior approximations $\overline{\pi}(\theta\mid\mathsf{L}_{m,n})$ for $m\in \{2,10,50\}$, which should be contrasted to the outcomes of the zig-zag with $b \in \{2,10.50\}$ in the top right panel. While the posterior approximations of bP-MCMC vary substantively with $m$, the choice of $b$ has no impact on the approximations obtained via zig-zag sampling. In the bottom panels, the same phenomenon is replicated for the larger sample size $n=1000$. Despite the fact that we run bP-MCMC with the even larger choices of $m\in \{20,100,1000\}$ (bottom left), the zig-zag sampler performs more reliably, and without requiring any increase of $b$ (bottom right).
  • Figure 2: We investigate computational efficiency by plotting approximation accuracy as a function of the number of times the algorithms draw a simulation from the model. Using a long run of zig-zag sampling as the gold standard and an essentially exact approximation of $\pi(\theta\mid\mathsf{L}_n)$, the plots chart the accuracy of the bP-MCMC (PM) and zig-zag (ZZ) samplers in the Gaussian copula model for $n=1000$ along the $y$-axis as the number of model simulations increases along the $x$-axis. In the top row, this is done by plotting differences between true and estimated posterior means for bP-MCMC with $m\in\{20,50,100,1000\}$ (top left) and zig-zag sampling with $b\in\{2,5,20,50\}$ (top right). Meanwhile, the bottom row compares the differences between true and estimated posterior standard deviations for bP-MCMC (bottom left) and zig-zag sampling (bottom right). While the zig-zag sampler's efficiency seems to behave similarly for all choices of $b$, the bP-MCMC sampler induces a trade-off between convergence speed and approximation accuracy that is determined by the magnitude of $m$.
  • Figure 3: Comparison of $\operatorname{MMD}$-Bayes posterior density for $\beta_1$ and $\log(\sigma)$ in the Gaussian linear regression model with $n=100$ as produced by bP-MCMC (PM) and zig-zag sampling (ZZ) for different choices of $m$ and $b$. Figure legend and interpretation as in \ref{['Fig:copula_PM_vs_ZZ_n100']}.
  • Figure 4: Accuracy of bP-MCMC (PM) and zig-zag sampling (ZZ) as the number of model simulations increases for $\beta_1$ and $n=100$ in the $\operatorname{MMD}$-Bayes regression example. Figure legend and interpretation as in \ref{['Fig:copula_PM_vs_ZZ_sims']}.
  • Figure 5: $\beta$-divergence posterior density for $\theta_1$ in the Poisson regression model. Left panel contains the bP-MCMC (PM) posterior approximations, while the right panel contains results for the zig-zag sampler (ZZ). Figure legend and interpretation as in \ref{['Fig:copula_PM_vs_ZZ_n100']}.
  • ...and 5 more figures

Theorems & Definitions (31)

  • Example 1: $\beta$-divergence
  • Example 2: MMD
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof : Proof of Lemma \ref{['lem:restate']}
  • Remark 1
  • Theorem 2
  • proof : Proof of \ref{['thm:new']}
  • Lemma 2: Poisson thinning lewis1979simulation
  • ...and 21 more