Table of Contents
Fetching ...

Optimality of the Half-Order Exponent in the Turing-Good Identities for Bayes Factors

Kensuke Okada

TL;DR

The paper identifies the half-order exponent $t=\tfrac{1}{2}$ as the unique, minimax-stable choice for validating Bayes-factor computations via the Turing--Good identities, by equalizing and bounding the second moments of the two diagnostic sides under mutual absolute continuity. It derives exact variance expressions in terms of the overlap $\rho=I(\tfrac{1}{2})$ and proves that the two-sided, split-sample half-order diagnostic has uniformly finite variance and, in the small-overlap regime, outperforms the standard one-sided Turing check. The work extends the half-order idea to a geometric bridge for normalizing-constant ratios and to importance-sampling diagnostics, where $\rho$ provides a distribution-free, interpretable measure of weight concentration. Simulations on binomial Bayes-factor workflows illustrate stable finite-sample behavior and sensitivity to simulator--evaluator mismatches, and the results motivate a practical default design: a symmetric two-sided half-order check with overlap estimation for robust, cost-aware diagnostics in Bayesian computation. Overall, the half-order framework offers a principled, stable basis for validating and designing ratio-estimation procedures under tail-robustness constraints.

Abstract

Bayes factors are widely computed by Monte Carlo, yet heavy-tailed sampling distributions can make numerical validation unreliable. The Turing--Good identities provide exact moment equalities for powers of a Bayes factor (a density ratio). When these identities are used as Good-check diagnostics, the power choice becomes a statistical design parameter. We develop a nonasymptotic variance theory for Monte Carlo evaluation of the identities and show that the half-order (square-root) power is uniquely minimax-stable: it equalizes variability across the two model orientations and is the only choice that guarantees finite second moments in a distribution-free worst-case sense over all mutually absolutely continuous model pairs. This yields a balanced two-sample half-order diagnostic that is symmetric in model labeling and has a uniform variance bound at fixed computational budget; in small-overlap regimes it is guaranteed to be no less efficient than the standard one-sided Turing check. Simulations for binomial Bayes factor workflows illustrate stable finite-sample behavior and sensitivity to simulator--evaluator mismatches. We further connect the half-order overlap viewpoint to stable primitives for normalizing-constant ratios and importance-sampling degeneracy summaries.

Optimality of the Half-Order Exponent in the Turing-Good Identities for Bayes Factors

TL;DR

The paper identifies the half-order exponent as the unique, minimax-stable choice for validating Bayes-factor computations via the Turing--Good identities, by equalizing and bounding the second moments of the two diagnostic sides under mutual absolute continuity. It derives exact variance expressions in terms of the overlap and proves that the two-sided, split-sample half-order diagnostic has uniformly finite variance and, in the small-overlap regime, outperforms the standard one-sided Turing check. The work extends the half-order idea to a geometric bridge for normalizing-constant ratios and to importance-sampling diagnostics, where provides a distribution-free, interpretable measure of weight concentration. Simulations on binomial Bayes-factor workflows illustrate stable finite-sample behavior and sensitivity to simulator--evaluator mismatches, and the results motivate a practical default design: a symmetric two-sided half-order check with overlap estimation for robust, cost-aware diagnostics in Bayesian computation. Overall, the half-order framework offers a principled, stable basis for validating and designing ratio-estimation procedures under tail-robustness constraints.

Abstract

Bayes factors are widely computed by Monte Carlo, yet heavy-tailed sampling distributions can make numerical validation unreliable. The Turing--Good identities provide exact moment equalities for powers of a Bayes factor (a density ratio). When these identities are used as Good-check diagnostics, the power choice becomes a statistical design parameter. We develop a nonasymptotic variance theory for Monte Carlo evaluation of the identities and show that the half-order (square-root) power is uniquely minimax-stable: it equalizes variability across the two model orientations and is the only choice that guarantees finite second moments in a distribution-free worst-case sense over all mutually absolutely continuous model pairs. This yields a balanced two-sample half-order diagnostic that is symmetric in model labeling and has a uniform variance bound at fixed computational budget; in small-overlap regimes it is guaranteed to be no less efficient than the standard one-sided Turing check. Simulations for binomial Bayes factor workflows illustrate stable finite-sample behavior and sensitivity to simulator--evaluator mismatches. We further connect the half-order overlap viewpoint to stable primitives for normalizing-constant ratios and importance-sampling degeneracy summaries.
Paper Structure (17 sections, 13 theorems, 80 equations, 4 figures, 2 tables)

This paper contains 17 sections, 13 theorems, 80 equations, 4 figures, 2 tables.

Key Result

Theorem 2.1

For any $t\in D$,

Figures (4)

  • Figure 1: Simulation study 1A (binomial example, correctly specified; $n=50$): fan charts of running Monte Carlo discrepancies as a function of $m$ (number of simulated datasets per hypothesis) across $R=10{,}000$ independent runs. Panel (A) shows $\widehat{\Delta}_{1/2}(m)=\widehat{\mathbb{E}}_{\mathcal{H}_2,\le m}[B^{1/2}]-\widehat{\mathbb{E}}_{\mathcal{H}_1,\le m}[B^{-1/2}]$. Panel (B) shows the reverse Turing discrepancy $\widehat{\mathbb{E}}_{\mathcal{H}_1,\le m}[B^{-1}]-1$. Panel (C) shows the reverse Good discrepancy $\widehat{\mathbb{E}}_{\mathcal{H}_1,\le m}[B^{-2}]-\widehat{\mathbb{E}}_{\mathcal{H}_2,\le m}[B^{-1}]$. Solid curves are means; shaded bands are central 50% (dark) and 90% (light) intervals; the horizontal dotted line indicates $0$ (the exact identity).
  • Figure 2: Simulation study 1B (binomial example; simulator--evaluator mismatch; $n=50$): fan charts of running Monte Carlo discrepancies across $R=10{,}000$ runs. Panels (A)--(C) match Figure \ref{['fig:sim1_outliers_sqrt']}. The vertical blue line indicates the first $m$ at which the central 90% band excludes $0$ (on the plotted $m$ grid).
  • Figure 3: Toy normalizing-constant ratio estimation under tail mismatch in the $\mathrm{Beta}(a,1)$ vs. $\mathrm{Unif}(0,1)$ family (Simulation study 3). We report $\log_{10}(\widehat{r}/r)$ over $R=2000$ independent repetitions at matched computational cost $N_{\text{total}}=2000$. Top row ($a=\tfrac{1}{2}$): forward breakdown. Panels (A)--(B) compare the forward one-sided estimator $\widehat{r}_{\mathrm{F}}=\frac{1}{N}\sum_{i=1}^N w(X_{2i})$ ($X_{2i}\sim p_2$; $N=2000$) to the half-order bridge $\widehat{r}_{1/2}$ ($n_1=n_2=1000$). Because $\mathbb{E}_{p_2}[w^2]=\infty$ at $a=\tfrac{1}{2}$, the forward estimator exhibits a pronounced right tail and many outliers. Bottom row ($a=3$): reverse breakdown. Panels (C)--(D) compare the reverse one-sided estimator $\widehat{r}_{\mathrm{R}}=\{\frac{1}{N}\sum_{i=1}^N w(X_{1i})^{-1}\}^{-1}$ ($X_{1i}\sim p_1$; $N=2000$) to the same half-order bridge estimator ($n_1=n_2=1000$). Because $\mathbb{E}_{p_1}[w^{-2}]=\infty$ for $a\ge 2$, the reverse estimator develops a heavy left tail (extreme underestimation on the log scale). In all panels, the half-order bridge remains tightly concentrated around $0$. Panel headers report the closed-form overlap $\rho^2=4a/(a+1)^2$ and the corresponding asymptotic RSD prediction from Proposition \ref{['prop:halforder-bridge-stability']}.
  • Figure 4: Importance-sampling weight concentration away from $t=\tfrac{1}{2}$ and stability anchored by the half-order overlap in the $\mathrm{Beta}(a,1)$ vs. $\mathrm{Unif}(0,1)$ family ($R=500$ replicates, $N=2000$). Top row (proposal-side): $X_i\sim p_2$ with $a=1/2$, comparing weights $w=W^t$ for $t\in\{1,\tfrac{1}{2}\}$. Bottom row (target-side): $X_i\sim p_1$ with $a=3$, comparing reverse weights $w=W^{t-1}$ for $t\in\{\tfrac{1}{4},\tfrac{1}{2}\}$. Left panels show Lorenz-type concentration curves $C(p)=\sum_{i=1}^{\lceil pN\rceil}\tilde{w}_{(i)}$ (median and 10/90% envelopes across replicates) on a log-$p$ scale; the diagonal $C(p)=p$ corresponds to uniform weights. Middle panels show boxplots of the normalized reciprocal squared-weight concentration $\kappa_N := 1/(N\sum_{i=1}^N \tilde{w}_i^2)$ (commonly reported as $\mathrm{ESS}/N$), with dotted reference lines at the half-order overlap benchmark $\rho^2=4a/(a+1)^2$. Right panels show the cumulative weight share carried by the largest 1% of weights ($S_{0.01}$); the dotted line is the uniform baseline $0.01$. In both stress tests, the half-order weights exhibit mild concentration and $\kappa_N$ close to $\rho^2$, whereas the tail-sensitive exponents yield substantial concentration and degraded squared-weight summaries.

Theorems & Definitions (26)

  • Theorem 2.1: Turing--Good moment identity; Good1985
  • proof
  • Corollary 2.2: Turing's theorem
  • Lemma 2.3: Basic bounds and equality characterization
  • proof
  • Lemma 2.4: Variance identities
  • proof
  • Corollary 2.5: Half-order equalization
  • proof
  • Lemma 2.6: CGF of $\log B$ and log-convexity of $I$
  • ...and 16 more