Table of Contents
Fetching ...

Time-uniform central limit theory and asymptotic confidence sequences

Ian Waudby-Smith, David Arbour, Ritwik Sinha, Edward H. Kennedy, Aaditya Ramdas

TL;DR

Time-uniform analogues of asymptotic confidence intervals based on the central limit theorem are introduced, providing valid inference at arbitrary stopping times and deriving a universal asymPTotic CS that requires only weak CLT-like assumptions.

Abstract

Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals, adding to the literature on confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time -- which provide valid inference at arbitrary stopping times and incur no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, enjoying finite-sample guarantees but not the aforementioned broad applicability of asymptotic confidence intervals. This work provides a definition for "asymptotic CSs" and a general recipe for deriving them. Asymptotic CSs forgo nonasymptotic validity for CLT-like versatility and (asymptotic) time-uniform guarantees. While the CLT approximates the distribution of a sample average by that of a Gaussian for a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration, we derive asymptotic CSs for the average treatment effect in observational studies (for which nonasymptotic bounds are essentially impossible to derive even in the fixed-time regime) as well as randomized experiments, enabling causal inference in sequential environments.

Time-uniform central limit theory and asymptotic confidence sequences

TL;DR

Time-uniform analogues of asymptotic confidence intervals based on the central limit theorem are introduced, providing valid inference at arbitrary stopping times and deriving a universal asymPTotic CS that requires only weak CLT-like assumptions.

Abstract

Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals, adding to the literature on confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time -- which provide valid inference at arbitrary stopping times and incur no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, enjoying finite-sample guarantees but not the aforementioned broad applicability of asymptotic confidence intervals. This work provides a definition for "asymptotic CSs" and a general recipe for deriving them. Asymptotic CSs forgo nonasymptotic validity for CLT-like versatility and (asymptotic) time-uniform guarantees. While the CLT approximates the distribution of a sample average by that of a Gaussian for a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration, we derive asymptotic CSs for the average treatment effect in observational studies (for which nonasymptotic bounds are essentially impossible to derive even in the fixed-time regime) as well as randomized experiments, enabling causal inference in sequential environments.

Paper Structure

This paper contains 51 sections, 32 theorems, 258 equations, 10 figures.

Key Result

Theorem 2.2

Suppose $(Y_t)_{t=1}^\infty \sim \mathbb P$ is an infinite sequence of i.i.d. observations from a distribution $\mathbb P$ with mean $\mu$ and finite variance. Let $\widehat{\mu}_t := \frac{1}{t} \sum_{i=1}^t Y_i$ be the sample mean, and $\widehat{\sigma}_t^2 := \frac{1}{t}\sum_{i=1}^t Y_i^2 - (\wid forms a $(1-\alpha)$-AsympCS for $\mu$.

Figures (10)

  • Figure 1: The left plot shows one run of a single experiment: an asymptotic CS alongside an asymptotic CI for a parameter of interest (in this case, the average treatment effect (ATE) of 0.4, an example we expand on in \ref{['section:csate']}). The true value of the ATE is covered by the CS simultaneously from time 30 to 10000. On the other hand, the CI fails to cover the true ATE at several points in time. By repeating such an experiment hundreds of times, one obtains the right plot which displays the cumulative probability of miscoverage --- i.e. the probability of the CS or CI failing to capture the true ATE at any time up to $t$. Notice that the CI error rate begins at $\alpha=0.1$ and quickly grows, while the CS error rate never exceeds $\alpha=0.1$.
  • Figure 2: A $90\%$-AsympCS for the time-varying mean $\widetilde{\mu}_t$ using \ref{['theorem:lindeberg-martingale-asympcs']} with $\rho$ optimized for $t^\star = 500$ based on the exact solution of \ref{['section:optimizingMixture']}. Here, we have set $\mu_t := \frac{1}{2}(1 - \sin(2 \log(e + 10t)) / \log(e + 0.01t))$ to produce the sinusoidal behavior of $\widetilde{\mu}_t$. Notice that $\widetilde{C}_t$ uniformly captures $\widetilde{\mu}_t$, adapting to its non-stationarity.
  • Figure 3: A schematic illustrating sequential sample splitting. At each time step $t$, the new observation $Z_t$ is randomly assigned to ${\mathcal{D}}_\infty^\mathrm{trn}$ or ${\mathcal{D}}_\infty^\mathrm{eval}$ with equal probability (1/2). Nuisance function estimators $(\widehat{\mu}_{{T'}}^1, \widehat{\mu}_{{T'}}^0, \widehat{\pi}_{{T'}})$ are constructed using ${\mathcal{D}}_\infty^\mathrm{trn}$ which then yield $\widehat{f}_{{T'}}$. The sample-split estimator $\widehat{\psi}_t^\mathrm{split}$ is defined as the sample average $\frac{1}{{T}} \sum_{i=1}^{T} \widehat{f}_{{T'}}(Z_i^\mathrm{eval})$ where each $Z_i^\mathrm{eval} \in {\mathcal{D}}_\infty^\mathrm{eval}$.
  • Figure 4: Three 90%-AsympCSs for the average treatment effect in a simulated randomized experiment using different regression estimators. Notice that all three confidence sequences uniformly capture the average treatment effect $\psi$, but more sophisticated models do so more efficiently, with AIPW+stacking greatly outperforming IPW.
  • Figure 5: Three 90%-AsympCSs for the ATE in an observational study using three different estimators --- a difference-in-means estimator, AIPW with parametric models, and AIPW with an ensemble of predictors combined via stacking. Unlike the randomized setup, only the stacking ensemble is consistent, since the other two are misspecified. Not only is the stacking-based AsympCS converging to $\psi$, but it is also the tightest of the three models at each time step.
  • ...and 5 more figures

Theorems & Definitions (74)

  • Definition 2.1: Asymptotic confidence sequences
  • Remark 1: Why almost surely?
  • Theorem 2.2: Gaussian mixture asymptotic confidence sequence
  • Proposition 2.3: Iterated logarithm asymptotic confidence sequences
  • Theorem 2.4: An abstract AsympCS for well-approximated processes
  • Proposition 2.5: Lindeberg-type Gaussian mixture martingale AsympCS
  • Corollary 2.6: Lyapunov-type AsympCS
  • Definition 2.7: Asymptotic time-uniform coverage
  • Theorem 2.8: Asymptotic $(1-\alpha)$-coverage for Gaussian mixture AsympCSs
  • Proposition 2.9: Delayed-start AsympCS
  • ...and 64 more