Table of Contents
Fetching ...

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Hongjian Wang, Aaditya Ramdas

Abstract

In 1976, Lai constructed a nontrivial confidence sequence for the mean $μ$ of a Gaussian distribution with unknown variance $σ^2$. Curiously, he employed both an improper (right Haar) mixture over $σ$ and an improper (flat) mixture over $μ$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $σ$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability $α$ that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Abstract

In 1976, Lai constructed a nontrivial confidence sequence for the mean of a Gaussian distribution with unknown variance . Curiously, he employed both an improper (right Haar) mixture over and an improper (flat) mixture over . Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.
Paper Structure (43 sections, 37 theorems, 161 equations, 6 figures, 3 tables)

This paper contains 43 sections, 37 theorems, 161 equations, 6 figures, 3 tables.

Key Result

Lemma 2.1

If $\{ M_n \}_{n \geqslant 0}$ is an NSM for $\mathcal{P}$, for all $P \in \mathcal{P}$, $\varepsilon \in (0, 1)$, Consequently, if $\{ M_n \}_{n \geqslant 0}$ is an e-processes for $\mathcal{P}$, for all $P \in \mathcal{P}$, $\varepsilon \in (0, 1)$,

Figures (6)

  • Figure 1: Logarithm of e-processes for the null $\mathcal{N}_{\mu = 0}$, averaged over 100 independent repeats. Dotted grey horizontal lines represent the rejection threshold $\log 20$ for $\alpha = 0.05$. The Gaussian mixture method is less sensitive to the prior choice, whereas universal inference exhibits a higher sensitivity to the plug-in estimators. The burn-in technique helps universal inference improve worst-case performance.
  • Figure 2: E-processes testing the null of no effect over three groups. Dotted grey horizontal lines represent the rejection threshold $\log 20$ for $\alpha = 0.05$. In the table, we compare p-values obtained via different methods. Asterisks denote p-values smaller than 0.05, with the associated $\tau_{\text{rej}}$ denoting the rejection time when the e-process first reaches 20 (or p-value first reaches 0.05).
  • Figure 3: Five classes of confidence sequences for t-test under $N_{0,1}$ observations.
  • Figure 4: This pair of plots studies the behavior of width as one of $\alpha$ and $n$ vary, holding the other fixed. In the left plot, we plot widths of 3 CSs against $\alpha$ at $n=500$; whereas in the right plot, we plot widths multiplied by $\sqrt{n}$ against sample size $n$ at $\alpha=0.05$. Both are under $N_{0,1}$ observations, repeated 100 times. Lines indicate averages over 100 repeats.
  • Figure 5: Growth rates of the t-CS by universal inference (Theorem \ref{['thm:ui-ttest']}) and the classical t-test CI with $n=3$ observations.
  • ...and 1 more figures

Theorems & Definitions (65)

  • Lemma 2.1: Ville's inequality
  • Lemma 2.2: Extended Ville's inequality
  • Definition 2.3: e-power
  • Lemma 2.4
  • Lemma 2.5
  • Lemma 2.6
  • Corollary 3.1: Universal inference Z-test martingale
  • Theorem 3.2: Universal inference t-test e-process
  • Proposition 3.3: e-power of the universal inference t-test e-process
  • Theorem 3.4: Universal inference one-sided t-test e-process
  • ...and 55 more