Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Hongjian Wang; Aaditya Ramdas

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Hongjian Wang, Aaditya Ramdas

Abstract

In 1976, Lai constructed a nontrivial confidence sequence for the mean $μ$ of a Gaussian distribution with unknown variance $σ^2$. Curiously, he employed both an improper (right Haar) mixture over $σ$ and an improper (flat) mixture over $μ$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $σ$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability $α$ that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Abstract

In 1976, Lai constructed a nontrivial confidence sequence for the mean

of a Gaussian distribution with unknown variance

. Curiously, he employed both an improper (right Haar) mixture over

and an improper (flat) mixture over

. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over

with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability

that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.

Paper Structure (43 sections, 37 theorems, 161 equations, 6 figures, 3 tables)

This paper contains 43 sections, 37 theorems, 161 equations, 6 figures, 3 tables.

Introduction
Preliminaries
Notations
Sequential Statistics
Test Processes and Their Maximal Inequalities
e-Power of Test Processes
Sequential t-Test and t-Confidence Sequences
Likelihood Ratio Martingales
t-Test e-Processes via Universal Inference
Sequential t-Tests via Scale Invariance
lai1976confidence's lai1976confidence Confidence Sequence
Scale Invariant Filtration
Scale Invariant Likelihood Ratios for Location-Scale Families
An Extended Test Martingale for t-Test
Classical Test Martingales for t-Test
...and 28 more sections

Key Result

Lemma 2.1

If $\{ M_n \}_{n \geqslant 0}$ is an NSM for $\mathcal{P}$, for all $P \in \mathcal{P}$, $\varepsilon \in (0, 1)$, Consequently, if $\{ M_n \}_{n \geqslant 0}$ is an e-processes for $\mathcal{P}$, for all $P \in \mathcal{P}$, $\varepsilon \in (0, 1)$,

Figures (6)

Figure 1: Logarithm of e-processes for the null $\mathcal{N}_{\mu = 0}$, averaged over 100 independent repeats. Dotted grey horizontal lines represent the rejection threshold $\log 20$ for $\alpha = 0.05$. The Gaussian mixture method is less sensitive to the prior choice, whereas universal inference exhibits a higher sensitivity to the plug-in estimators. The burn-in technique helps universal inference improve worst-case performance.
Figure 2: E-processes testing the null of no effect over three groups. Dotted grey horizontal lines represent the rejection threshold $\log 20$ for $\alpha = 0.05$. In the table, we compare p-values obtained via different methods. Asterisks denote p-values smaller than 0.05, with the associated $\tau_{\text{rej}}$ denoting the rejection time when the e-process first reaches 20 (or p-value first reaches 0.05).
Figure 3: Five classes of confidence sequences for t-test under $N_{0,1}$ observations.
Figure 4: This pair of plots studies the behavior of width as one of $\alpha$ and $n$ vary, holding the other fixed. In the left plot, we plot widths of 3 CSs against $\alpha$ at $n=500$; whereas in the right plot, we plot widths multiplied by $\sqrt{n}$ against sample size $n$ at $\alpha=0.05$. Both are under $N_{0,1}$ observations, repeated 100 times. Lines indicate averages over 100 repeats.
Figure 5: Growth rates of the t-CS by universal inference (Theorem \ref{['thm:ui-ttest']}) and the classical t-test CI with $n=3$ observations.
...and 1 more figures

Theorems & Definitions (65)

Lemma 2.1: Ville's inequality
Lemma 2.2: Extended Ville's inequality
Definition 2.3: e-power
Lemma 2.4
Lemma 2.5
Lemma 2.6
Corollary 3.1: Universal inference Z-test martingale
Theorem 3.2: Universal inference t-test e-process
Proposition 3.3: e-power of the universal inference t-test e-process
Theorem 3.4: Universal inference one-sided t-test e-process
...and 55 more

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Abstract

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (65)