A New Look at Bayesian Testing

Jyotishka Datta; Nicholas G. Polson; Vadim Sokolov; Daniel Zantedeschi

A New Look at Bayesian Testing

Jyotishka Datta, Nicholas G. Polson, Vadim Sokolov, Daniel Zantedeschi

TL;DR

This paper develops a unified Bayesian hypothesis-testing framework based on moderate-deviation theory, showing that optimal Bayes cutoffs scale on the MD axis as $t\sim\sqrt{\log n}$ rather than remaining fixed. It explains the Lindley paradox by revealing how prior mass and MD-scale thresholds jointly govern Bayes risk, and shows how classical criteria like Jeffreys’ $\sqrt{\log n}$ threshold and the BIC penalty $(d/2)\log n$ arise naturally from this framework. The Rubin–Sethuraman risk calculus is extended to high-dimensional sparsity, goodness-of-fit, and model selection, connecting testing to information-theoretic ideas via Chernoff information, KL divergences, and the entropy concentration phenomenon. The results justify sample-size adaptive significance levels and provide a decision-theoretic bridge between Bayesian testing and fixed-$\alpha$ Neyman–Pearson procedures, with implications for e-values, safe testing, and GOF analysis. Overall, the work unifies several strands of classical results under moderate deviation analysis, offering a principled basis for adaptive thresholds and linking Bayesian testing to both compression and gambling interpretations.

Abstract

We develop a unified framework for Bayesian hypothesis testing through the theory of moderate deviations, providing explicit asymptotic expansions for Bayes risk and optimal test statistics. Our analysis reveals that Bayesian test cutoffs operate on the moderate deviation scale $\sqrt{\log n/n}$, in sharp contrast to the sample-size-invariant calibrations of classical testing. This fundamental difference explains the Lindley paradox and establishes the risk-theoretic superiority of Bayesian procedures over fixed-$α$ Neyman-Pearson tests. We extend the seminal Rubin (1965) program to contemporary settings including high-dimensional sparse inference, goodness-of-fit testing, and model selection. The framework unifies several classical results: Jeffreys' $\sqrt{\log n}$ threshold, the BIC penalty $(d/2)\log n$, and the Chernoff-Stein error exponents all emerge naturally from moderate deviation analysis of Bayes risk. Our results provide theoretical foundations for adaptive significance levels and connect Bayesian testing to information theory through gambling-based interpretations.

A New Look at Bayesian Testing

TL;DR

This paper develops a unified Bayesian hypothesis-testing framework based on moderate-deviation theory, showing that optimal Bayes cutoffs scale on the MD axis as

rather than remaining fixed. It explains the Lindley paradox by revealing how prior mass and MD-scale thresholds jointly govern Bayes risk, and shows how classical criteria like Jeffreys’

threshold and the BIC penalty

arise naturally from this framework. The Rubin–Sethuraman risk calculus is extended to high-dimensional sparsity, goodness-of-fit, and model selection, connecting testing to information-theoretic ideas via Chernoff information, KL divergences, and the entropy concentration phenomenon. The results justify sample-size adaptive significance levels and provide a decision-theoretic bridge between Bayesian testing and fixed-

Neyman–Pearson procedures, with implications for e-values, safe testing, and GOF analysis. Overall, the work unifies several strands of classical results under moderate deviation analysis, offering a principled basis for adaptive thresholds and linking Bayesian testing to both compression and gambling interpretations.

Abstract

, in sharp contrast to the sample-size-invariant calibrations of classical testing. This fundamental difference explains the Lindley paradox and establishes the risk-theoretic superiority of Bayesian procedures over fixed-

Neyman-Pearson tests. We extend the seminal Rubin (1965) program to contemporary settings including high-dimensional sparse inference, goodness-of-fit testing, and model selection. The framework unifies several classical results: Jeffreys'

threshold, the BIC penalty

, and the Chernoff-Stein error exponents all emerge naturally from moderate deviation analysis of Bayes risk. Our results provide theoretical foundations for adaptive significance levels and connect Bayesian testing to information theory through gambling-based interpretations.

Paper Structure (73 sections, 2 theorems, 126 equations, 1 table)

This paper contains 73 sections, 2 theorems, 126 equations, 1 table.

Introduction
Bayes versus $p$-values.
Type I and II Errors for Bayes Tests.
Moderate vs Large Deviations: The Key Distinction.
Three Deviation Regimes.
Fixed-$\alpha$ Classical Testing Uses CLT Calibration.
Bayes Risk Decomposition
Bayes vs Likelihood Ratio Tests
Bayes Tests vs Likelihood Ratio Tests.
Bayes Testing.
BIC as large-sample approximation.
Dawid's decomposition dawid1970limiting.
Asymptotic calibration of the Bayes test.
Heuristics: thresholding and a toy normal example.
E-values and safe testing.
...and 58 more sections

Key Result

Theorem 1

Consider testing $H_0: \theta = 0$ versus $H_a: \theta \neq 0$ in a Gaussian model with $\bar{X}_n \sim \mathcal{N}(\theta, \sigma^2/n)$ and a Cauchy prior $\theta \sim C(0, \sigma)$ under $H_a$ (assuming $\sigma$ is known). The Bayes factor in favor of $H_0$ satisfies where $t = \sqrt{n}\bar{x}/\sigma$ is the $t$-statistic. Setting $BF_{01}\approx 1$ (the indifference boundary, ignoring the asym

Theorems & Definitions (4)

Theorem 1: Bayesian Critical Value
proof
Theorem 2: Dawid
proof

A New Look at Bayesian Testing

TL;DR

Abstract

A New Look at Bayesian Testing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (4)