Optimal E-Values for Exponential Families: the Simple Case

Peter Grünwald; Tyron Lardy; Yunda Hao; Shaul K. Bar-Lev; Martijn de Jong

Optimal E-Values for Exponential Families: the Simple Case

Peter Grünwald, Tyron Lardy, Yunda Hao, Shaul K. Bar-Lev, Martijn de Jong

TL;DR

This paper derives a general, checkable condition under which simple e-variables, in the form of a likelihood-ratio between a simple alternative $Q$ and a simple null $P_{\mu^*}$, exist for composite exponential-family nulls. By constructing an auxiliary exponential family $\mathcal{Q}$ sharing the same sufficient statistic as the null and analyzing the covariance structure via $\Sigma_p(\bm\mu)$ and $\Sigma_q(\bm\mu)$, the authors show that $q(U)/p_{\mu^*}(U)$ is a globally GRO e-variable whenever $\Sigma_p(\bm\mu)-\Sigma_q(\bm\mu)$ is positive semidefinite for all relevant $\bm\mu$. They establish eight equivalent conditions, including KL-divergence inequalities and log-partition function comparisons, and demonstrate the result across diverse settings: Gaussian location with shared or distinct covariance, Gaussian and Poisson k-sample tests, Bernoulli cases, Gaussian scale, NEFs, and a linear-regression model. The work unifies and extends prior results on simple e-variables, enabling easy computation and anytime-valid testing, and provides a framework for mixture-based composites and sequential testing. Practical impact lies in offering computable, GRO e-variables for a broad class of hypothesis tests in exponential-family models, with clear criteria for when they exist and how to construct them.

Abstract

We provide a general condition under which e-variables in the form of a simple-vs.-simple likelihood ratio exist when the null hypothesis is a composite, multivariate exponential family. Such `simple' e-variables are easy to compute and expected-log-optimal with respect to any stopping time. Simple e-variables were previously only known to exist in quite specific settings, but we offer a unifying theorem on their existence for testing exponential families. We start with a simple alternative $Q$ and a regular exponential family null. Together these induce a second exponential family ${\cal Q}$ containing $Q$, with the same sufficient statistic as the null. Our theorem shows that simple e-variables exist whenever the covariance matrices of ${\cal Q}$ and the null are in a certain relation. A prime example in which this relation holds is testing whether a parameter in a linear regression is 0. Other examples include some $k$-sample tests, Gaussian location- and scale tests, and tests for more general classes of natural exponential families. While in all these examples, the implicit composite alternative is also an exponential family, in general this is not required.

Optimal E-Values for Exponential Families: the Simple Case

TL;DR

This paper derives a general, checkable condition under which simple e-variables, in the form of a likelihood-ratio between a simple alternative

and a simple null

, exist for composite exponential-family nulls. By constructing an auxiliary exponential family

sharing the same sufficient statistic as the null and analyzing the covariance structure via

and

, the authors show that

is a globally GRO e-variable whenever

is positive semidefinite for all relevant

. They establish eight equivalent conditions, including KL-divergence inequalities and log-partition function comparisons, and demonstrate the result across diverse settings: Gaussian location with shared or distinct covariance, Gaussian and Poisson k-sample tests, Bernoulli cases, Gaussian scale, NEFs, and a linear-regression model. The work unifies and extends prior results on simple e-variables, enabling easy computation and anytime-valid testing, and provides a framework for mixture-based composites and sequential testing. Practical impact lies in offering computable, GRO e-variables for a broad class of hypothesis tests in exponential-family models, with clear criteria for when they exist and how to construct them.

Abstract

and a regular exponential family null. Together these induce a second exponential family

containing

, with the same sufficient statistic as the null. Our theorem shows that simple e-variables exist whenever the covariance matrices of

and the null are in a certain relation. A prime example in which this relation holds is testing whether a parameter in a linear regression is 0. Other examples include some

-sample tests, Gaussian location- and scale tests, and tests for more general classes of natural exponential families. While in all these examples, the implicit composite alternative is also an exponential family, in general this is not required.

Paper Structure (28 sections, 5 theorems, 65 equations, 4 figures)

This paper contains 28 sections, 5 theorems, 65 equations, 4 figures.

Introduction
E-variables
Main Result and Overview
Formal Setting
The Composite Alternative Generated by A Simple One
Existence of Simple Local E-Variables
Existence of Simple Global E-Variables (Main Result)
Simplifying Situations
Examples
Zero Curvature
Multivariate Gaussian Location with shared covariance, constrained null
Gaussian and Poisson k-sample tests
Constant Curvature: Multivariate Gaussian Location, distinct covariance
Nonconstant Curvature: Univariate Examples
More k-Sample Tests
...and 13 more sections

Key Result

Proposition 1

Fix a probability measure $Q$ on $U$. If there exists a simple e-variable relative to $Q$, then it must be the GRO e-variable for testing $\mathcal{P}$ against alternative $\{Q \}$.

Figures (4)

Figure 1: The family $\mathcal{Q}$ for various $(m,s^2)$. The coordinate grid represents the parameters of the full Gaussian family, the horizontal line shows the parameter space of $\mathcal{P}$, the sloped lines show the parameters of the distributions in $\mathcal{Q}$, and the dashed lines show the projection of $(m,s^2)$ onto the parameter space of $\mathcal{P}$. For example, we may start out with $Q$ expressing $U \sim N(m,s^2)$ with $m= -3.0, s^2 =9.0$, represented as the green dot on the green line. Its RIPr onto $\mathcal{P}$ is the green point on the yellow line. The corresponding family $\mathcal{Q}$, constructed in terms of $Q$ and $\mathcal{P}$, is depicted by the green solid line. The theorem implies that the likelihood ratio between any point on the green line and its RIPr onto the yellow line is an e-variable; similarly for the red and blue lines.
Figure 2: The family $\mathcal{Q}$ for various ${\bm \mu}^*$. The coordinate grid represents the parameters of the full $2$-sample Bernoulli family, the straight line shows the parameter space of $\mathcal{P}$, the curved lines show the parameters of the distributions in $\mathcal{Q}$, and the dashed lines show the projection of ${\bm \mu}^*$ onto the parameter space of $\mathcal{P}$.
Figure 3: The expected value of $q_\mu(U)/p_\mu(U)$ under the null $P_{\mu'}$ for varying $\mu'$.
Figure 4: The coordinate grid represents the parameters of the full $2$-sample Bernoulli family. The straight yellow line shows the parameter space of $\mathcal{P}$. The green point represents a single alternative point $(m_1,m_2)=(0.375,0.625)$. The shaded red area represents the effect size $\mu_2-\mu_1 > m_2-m_1=0.25$ and the shaded green and red areas together represent $\log( \frac{\mu_2}{1-\mu_2} \frac{1-\mu_1}{\mu_1})>\log( \frac{m_2}{1-m_2} \frac{1-m_1}{m_1})$. The curved green line shows the parameters of the distributions in $\mathcal{Q}$ (which coincides with the boundary of the effect size region). Finally, the dashed line shows the projection of $(m_1,m_2)$ onto the parameter space of $\mathcal{P}$.

Theorems & Definitions (5)

Proposition 1
Proposition 2
Theorem 1
Proposition 3
Corollary 1

Optimal E-Values for Exponential Families: the Simple Case

TL;DR

Abstract

Optimal E-Values for Exponential Families: the Simple Case

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)