Table of Contents
Fetching ...

Besag-Clifford e-values for unnormalized testing

Alexander Dombowsky, Barbara E. Engelhardt, Aaditya Ramdas

Abstract

Unnormalized probability distributions are frequently used in machine learning for modeling complex data generating processes. Though Markov chain Monte Carlo (MCMC) algorithms can approximately sample from unnormalized distributions, intractability of their normalizing constants renders likelihood ratio testing infeasible. We propose to use the parallel method of Besag and Clifford to generate samples that are exchangeable with the data under the null, to then generate valid e-values for any number of iterations or algorithmic steps. We show that as the number of samples grows, these Besag-Clifford e-values constructed using the unnormalized likelihood ratio are actually log-optimal up to a multiplicative term that diminishes with the mixing time of the Markov chain. Additionally, averaging over the output of multiple chains retains validity while increasing the e-power. We extend Besag-Clifford e-values to the general problem of unnormalized test statistics, which allows application to composite hypotheses, uncertainty quantification, generative model evaluation, and sequential testing. Through simulations and an application to galaxy velocity modeling, we empirically verify our theory, explore the impact of autocorrelation and mixing, and evaluate the performance of Besag-Clifford e-values.

Besag-Clifford e-values for unnormalized testing

Abstract

Unnormalized probability distributions are frequently used in machine learning for modeling complex data generating processes. Though Markov chain Monte Carlo (MCMC) algorithms can approximately sample from unnormalized distributions, intractability of their normalizing constants renders likelihood ratio testing infeasible. We propose to use the parallel method of Besag and Clifford to generate samples that are exchangeable with the data under the null, to then generate valid e-values for any number of iterations or algorithmic steps. We show that as the number of samples grows, these Besag-Clifford e-values constructed using the unnormalized likelihood ratio are actually log-optimal up to a multiplicative term that diminishes with the mixing time of the Markov chain. Additionally, averaging over the output of multiple chains retains validity while increasing the e-power. We extend Besag-Clifford e-values to the general problem of unnormalized test statistics, which allows application to composite hypotheses, uncertainty quantification, generative model evaluation, and sequential testing. Through simulations and an application to galaxy velocity modeling, we empirically verify our theory, explore the impact of autocorrelation and mixing, and evaluate the performance of Besag-Clifford e-values.
Paper Structure (33 sections, 13 theorems, 66 equations, 8 figures, 2 algorithms)

This paper contains 33 sections, 13 theorems, 66 equations, 8 figures, 2 algorithms.

Key Result

Proposition 1

For testing a null $\mathcal{P}$ against an alternative $\mathcal{Q}$ using observed data $X$, if $(X, Y^{(1)}, \dots, Y^{(M)})$ are exchangeable for any $P \in \mathcal{P}$, then is an e-variable for any $M \in \mathbb N$.

Figures (8)

  • Figure 1: Scatter plots of $1,000$ simulated values of the Poisson$(1)$ and Poisson$(1.1)$ likelihood ratio and $\widehat{E}_M$ for $M \in \{ 10, 100, 500, 1000 \}$, where $n=100$, and the data are simulated from the alternative. The red line in each plot indicates the identity map.
  • Figure 2: Graphical representation of Algorithm \ref{['alg:besag-clifford-parallel']}. Starting from $X$, we evolve the chain backwards for $J$ steps to generate $Y^{(0)}$. From $Y^{(0)}$, the chain is evolved forwards for $J$ steps in parallel, producing $Y^{(1)}, \dots, Y^{(M)}$. $X$ and the draws are exchangeable when $X \sim P$.
  • Figure 3: A demonstration of Theorem \ref{['thm:Besag-parallel']} for the AR$(1)$ process. The top row compares $\widehat{E}_M(X)$ to $E(X)$ and the bottom row compares $\Delta^1(Y^{(0)}) \widehat{E}_M(X)$ to $E(X)$, where $M=1000$. The correlation is varied in $\phi \in \{ 0.3, 0.5, 0.8\}$, and we simulate $1,000$ independent replications from the $\mathcal{N}(1,1)$ distribution.
  • Figure 4: Estimated power (with logarithms taken for visualization) of $\textbf{1}(\widehat{E}_M(X) \geq 1/0.05)$ for different choices of the number of steps ($J$) and samples ($M$) in Algorithm \ref{['alg:besag-clifford-parallel']} and the AR$(1)$ process. The dashed horizontal line is the estimate of $Q(E(X)\geq 1/0.05)$.
  • Figure 5: Averages and 95% confidence bands of $500$ replications of the Besag-Clifford ULR e-process under the alternative, $\mathcal{N}(0,1)$, for $n=50$ time points. We increase the number of chains $S \in \{1,4,10\}$. The dotted line shows $\log(1/0.05)$.
  • ...and 3 more figures

Theorems & Definitions (24)

  • Proposition 1
  • proof
  • Theorem 1
  • Theorem 2
  • Proposition 2
  • Proposition 3
  • proof
  • Proposition 4
  • Proposition 5
  • proof
  • ...and 14 more