Time-Uniform Confidence Spheres for Means of Random Vectors

Ben Chugg; Hongjian Wang; Aaditya Ramdas

Time-Uniform Confidence Spheres for Means of Random Vectors

Ben Chugg, Hongjian Wang, Aaditya Ramdas

TL;DR

The paper develops time-uniform confidence spheres (CSSs) for estimating the mean of multivariate random vectors under martingale dependence, using a PAC-Bayesian framework to achieve anytime-valid bounds. It systematically covers light-tailed regimes (sub-Gaussian, log-concave, and sub-$\psi$) with dimension-free or near dimension-free widths and extends to time-varying means, including iterated-logarithm rates via stitching. For heavy-tailed data, it provides two semi-empirical CSSs and a sequential Catoni-Giulini estimator, all producing dimension-free bounds and robust performance, with fixed-time optimizations and asymptotic guarantees. The results yield practical, closed-form CSSs applicable to sequential tasks such as A/B testing, online learning, and anomaly detection, without requiring iid observations and under relatively weak moment conditions.

Abstract

We study sequential mean estimation in $\mathbb{R}^d$. In particular, we derive time-uniform confidence spheres -- confidence sphere sequences (CSSs) -- which contain the mean of random vectors with high probability simultaneously across all sample sizes. Our results include a dimension-free CSS for log-concave random vectors, a dimension-free CSS for sub-Gaussian random vectors, and CSSs for sub-$ψ$ random vectors (which includes sub-gamma, sub-Poisson, and sub-exponential distributions). Many of our results are optimal. For sub-Gaussian distributions we also provide a CSS which tracks a time-varying mean, generalizing Robbins' mixture approach to the multivariate setting. Finally, we provide several CSSs for heavy-tailed random vectors (two moments only). Our bounds hold under a martingale assumption on the mean and do not require that the observations be iid. Our work is based on PAC-Bayesian theory and inspired by an approach of Catoni and Giulini.

Time-Uniform Confidence Spheres for Means of Random Vectors

TL;DR

) with dimension-free or near dimension-free widths and extends to time-varying means, including iterated-logarithm rates via stitching. For heavy-tailed data, it provides two semi-empirical CSSs and a sequential Catoni-Giulini estimator, all producing dimension-free bounds and robust performance, with fixed-time optimizations and asymptotic guarantees. The results yield practical, closed-form CSSs applicable to sequential tasks such as A/B testing, online learning, and anomaly detection, without requiring iid observations and under relatively weak moment conditions.

Abstract

We study sequential mean estimation in

. In particular, we derive time-uniform confidence spheres -- confidence sphere sequences (CSSs) -- which contain the mean of random vectors with high probability simultaneously across all sample sizes. Our results include a dimension-free CSS for log-concave random vectors, a dimension-free CSS for sub-Gaussian random vectors, and CSSs for sub-

random vectors (which includes sub-gamma, sub-Poisson, and sub-exponential distributions). Many of our results are optimal. For sub-Gaussian distributions we also provide a CSS which tracks a time-varying mean, generalizing Robbins' mixture approach to the multivariate setting. Finally, we provide several CSSs for heavy-tailed random vectors (two moments only). Our bounds hold under a martingale assumption on the mean and do not require that the observations be iid. Our work is based on PAC-Bayesian theory and inspired by an approach of Catoni and Giulini.

Paper Structure (50 sections, 35 theorems, 185 equations, 3 figures, 2 tables)

This paper contains 50 sections, 35 theorems, 185 equations, 3 figures, 2 tables.

Introduction
Related work
Background and approach
Assumptions
Light-tailed random vectors
Sub-Gaussian bounds
Fixed-time optimization.
Log-concave distributions and finite Orlicz norm
Sub-$\psi$ distributions
Time-varying means under sub-Gaussianity
Obtaining confidence ellipsoids
Heavy-tailed random vectors
A first semi-empirical CSS
Scalar setting.
A semi-empirical CSS under symmetry
...and 35 more sections

Key Result

Proposition 1.1

For each $\theta\in\Theta$, assume that $Q(\theta) \equiv (Q_t(\theta))_{t\geqslant 1}$ is a nonnegative supermartingale with initial value 1. Consider a prior distribution $\nu$ over $\Theta$ (chosen before seeing the data). Then, with probability at least $1-\alpha$, we have that simultaneously fo

Figures (3)

Figure 1: Left: Comparison of Theorem \ref{['thm:sub-gaussian-dim-free']} and its stitched version, Theorem \ref{['thm:lil-subG']}, against the results of hsu2012tail. The latter is made time-uniform in two ways: by a union bound (dotted orange line) and by the doubling technique of duchi2024information (dotted purple). We begin the plotting the width at $t=150$ for scale purposes. We fix $\|\Sigma\|=1$ and take $\Tr(\Sigma) = 5$. Right: A comparison of estimators with iterated logarithm rates. We plot Theorem \ref{['thm:lil-subG']} against the bound of hsu2012tail---given iterated logarithm rates via duchi2024information---for various ratios of $\Tr(\Sigma^2)$ to $\Tr(\Sigma)$. As $\Tr(\Sigma^2)$ shrinks Theorem \ref{['thm:lil-subG']} starts to be dominated by the bound of hsu2012tail. Simulation details can be found in Appendix \ref{['sec:experiments']}.
Figure 2: Left: Comparison between our sequential Catoni-Giulini estimator (Theorem \ref{['thm:catoni-estimator']}), geometric median-of-means (GMoM) minsker2015geometric, and tournament median-of-means (TMoM) lugosi2019mean. We make the MoM estimators time-uniform in two ways: with a naive union bound (solid lines), and via the doubling method of duchi2024information (DH Doubling). Even though it has optimal (fixed-time) rates, the TMoM estimator suffers because of large constants. Right: A closer look at the performance of GMoM estimator compared to Theorem \ref{['thm:catoni-estimator']}. We assume that a practitioner knows either $\Tr(\Sigma)$ or $v^2$ (knowing both would imply knowledge of $\|\mu\|$), hence we set $\Tr(\Sigma) = v^2$ in the figures in order to compare their multipliers in the bound. Again, we make the GMoM time-uniform via (i) a union bound, and (ii) Duchi-Haque doubling. Simulation details can be found in Appendix \ref{['sec:experiments']}.
Figure 3: Left: The width of our empirical Bernstein CI as $n\to\infty$, which approaches its asymptotic width $W_\infty$. We use $\alpha=0.05$, $d=2$, and random vectors comprised of two $\text{Beta(10,10)}$ distributed random variables. Right: Performance of our empirical Bernstein bound compared to the multivariate (non-empirical) Bernstein bound baseline, with oracle access to the true variance. Shaded areas provide the standard deviation across 100 trials. The distributions are mixtures of betas and binomials. Mixture 2 has the lowest variance. As the variance decreases our empirical bounds get tighter and approach the tighest known oracle bounds (black and red dotted lines) gross2011recoveringkohler2017sub.

Theorems & Definitions (50)

Proposition 1.1: Corollary of Theorem 4, chugg2023unified
Remark 2.1
Theorem 2.2
Theorem 2.3
Remark 2.4
Corollary 2.5
Lemma 2.6
proof
Lemma 2.7
Theorem 2.8
...and 40 more

Time-Uniform Confidence Spheres for Means of Random Vectors

TL;DR

Abstract

Time-Uniform Confidence Spheres for Means of Random Vectors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (50)