The Polynomial Stein Discrepancy for Assessing Moment Convergence
Narayan Srinivasan, Matthew Sutton, Christopher Drovandi, Leah F South
TL;DR
The paper tackles the challenge of efficiently assessing posterior approximations when $P$ is known only up to normalization, particularly for biased MCMC methods. It introduces the Polynomial Stein Discrepancy (PSD), a linear-time discrepancy built from a finite polynomial basis, with $ ext{PSD}=ig|ig|ar{z}ig|ig|_2$ and $z_k= ext{E}_{Q}[ ext{A}P_k(X)]$, where $J=inom{d+r}{d}-1$ and $P_k$ are monomials up to degree $r$; equivalently, $ ext{PSD}^2= ext{sum}_{k=1}^J z_k^2$. In the Bernstein–von Mises limit, PSD$=0$ if and only if the first $r$ moments of $P$ and $Q$ match, enabling moment-focused GOF testing with either bootstrap or asymptotic distributions. The authors demonstrate that PSD achieves strong power for moment discrepancies, scales linearly with data, and substantially speeds up both GOF testing and hyper-parameter tuning for SG-MCMC, making it practical for large-scale Bayesian applications. They validate PSD against state-of-the-art approaches, showing competitive or superior performance in moment detection and substantial runtime gains, especially in higher dimensions.
Abstract
We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse-of-dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments in the Bernstein-von Mises limit. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.
