Mean Estimation in Banach Spaces Under Infinite Variance and Martingale Dependence
Justin Whitehouse, Ben Chugg, Diego Martinez-Taboada, Aaditya Ramdas
TL;DR
This paper addresses mean estimation for sequences of Banach-space–valued, heavy-tailed observations with potentially infinite variance under martingale dependence. It extends a simple truncation-based estimator by centering around a naive mean and proving time-uniform, line-crossing and iterated-logarithm concentration bounds that depend on a centered $p$-th moment with $p\in(1,2]$, yielding dimension-free guarantees. The main contributions include a general template bound (Theorem) for the estimator, a Banach-space martingale-based analysis with explicit constants, and a law-of-the-iterated-logarithm refinement that achieves tight asymptotics up to a doubly-logarithmic factor. Empirically, the estimator shows competitive performance against geometric median-of-means and tournament MoM, while offering online update efficiency and robustness to martingale dependence, making it practically appealing for heavy-tailed, high-dimensional settings.
Abstract
We consider estimating the shared mean of a sequence of heavy-tailed random variables taking values in a Banach space. In particular, we revisit and extend a simple truncation-based mean estimator first proposed by Catoni and Giulini. While existing truncation-based approaches require a bound on the raw (non-central) second moment of observations, our results hold under a bound on either the central or non-central $p$th moment for some $p \in (1,2]$. Our analysis thus handles distributions with infinite variance. The main contributions of the paper follow from exploiting connections between truncation-based mean estimation and the concentration of martingales in smooth Banach spaces. We prove two types of time-uniform bounds on the distance between the estimator and unknown mean: line-crossing inequalities, which can be optimized for a fixed sample size $n$, and iterated logarithm inequalities, which match the tightness of line-crossing inequalities at all points in time up to a doubly logarithmic factor in $n$. Our results do not depend on the dimension of the Banach space, hold under martingale dependence, and all constants in the inequalities are known and small.
