Table of Contents
Fetching ...

The Polynomial Stein Discrepancy for Assessing Moment Convergence

Narayan Srinivasan, Matthew Sutton, Christopher Drovandi, Leah F South

TL;DR

The paper tackles the challenge of efficiently assessing posterior approximations when $P$ is known only up to normalization, particularly for biased MCMC methods. It introduces the Polynomial Stein Discrepancy (PSD), a linear-time discrepancy built from a finite polynomial basis, with $ ext{PSD}=ig|ig|ar{z}ig|ig|_2$ and $z_k= ext{E}_{Q}[ ext{A}P_k(X)]$, where $J=inom{d+r}{d}-1$ and $P_k$ are monomials up to degree $r$; equivalently, $ ext{PSD}^2= ext{sum}_{k=1}^J z_k^2$. In the Bernstein–von Mises limit, PSD$=0$ if and only if the first $r$ moments of $P$ and $Q$ match, enabling moment-focused GOF testing with either bootstrap or asymptotic distributions. The authors demonstrate that PSD achieves strong power for moment discrepancies, scales linearly with data, and substantially speeds up both GOF testing and hyper-parameter tuning for SG-MCMC, making it practical for large-scale Bayesian applications. They validate PSD against state-of-the-art approaches, showing competitive or superior performance in moment detection and substantial runtime gains, especially in higher dimensions.

Abstract

We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse-of-dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments in the Bernstein-von Mises limit. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.

The Polynomial Stein Discrepancy for Assessing Moment Convergence

TL;DR

The paper tackles the challenge of efficiently assessing posterior approximations when is known only up to normalization, particularly for biased MCMC methods. It introduces the Polynomial Stein Discrepancy (PSD), a linear-time discrepancy built from a finite polynomial basis, with and , where and are monomials up to degree ; equivalently, . In the Bernstein–von Mises limit, PSD if and only if the first moments of and match, enabling moment-focused GOF testing with either bootstrap or asymptotic distributions. The authors demonstrate that PSD achieves strong power for moment discrepancies, scales linearly with data, and substantially speeds up both GOF testing and hyper-parameter tuning for SG-MCMC, making it practical for large-scale Bayesian applications. They validate PSD against state-of-the-art approaches, showing competitive or superior performance in moment detection and substantial runtime gains, especially in higher dimensions.

Abstract

We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse-of-dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments in the Bernstein-von Mises limit. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.

Paper Structure

This paper contains 20 sections, 4 theorems, 33 equations, 5 figures, 1 table.

Key Result

Corollary 3.0.1

Let $Z_1, \ldots, Z_{J}$ be i.i.d. random variables with $Z_i \sim \mathcal{N}(0, 1)$. Let $\mu := \mathbb{E}_{x \sim Q}[\tau(x)]$ and $\Sigma_r := \text{cov}_{x \sim r}[\tau(x)] \in \mathbb{R}^{J \times J}$ for $r \in \{P,Q\}$. Let $\{\omega_i\}_{i=1}^{J}$ be the eigenvalues of the covariance matri

Figures (5)

  • Figure 1: Type I error rate (a) and statistical power (b,c,d) for detecting discrepancies between the unit Gaussian $P$ and the sampling distribution $Q$ (see the main text for details).
  • Figure 2: Approximate posterior for mixture example with SGLD for varying step sizes and when sampling from the true posterior using MALA.
  • Figure 3: Step size selection results for SGLD using various methods.
  • Figure 4: Runtime for various testing methods where $P=\mathcal{N}(0_d,I_d)$ with $d=10$.
  • Figure 5: Type I error rate (a) and statistical power (b,c,d) for detecting discrepancies between the unit Gaussian $P$ and the sampling distribution $Q$, using the test based on the asymptotic distribution of the U-statistic.

Theorems & Definitions (8)

  • Corollary 3.0.1: jitkrittum_linear-time_2017
  • proof
  • Proposition 1
  • Corollary 3.0.2
  • proof
  • proof
  • Corollary C.0.1
  • proof