Gaussian Mean Testing under Truncation
Clément L. Canonne, Themis Gouleakis, Yuhao Wang, Joy Qiping Yang
TL;DR
The paper investigates Gaussian mean testing under truncation, formalizing the problem with a truncated $d$-dimensional Gaussian and a truncation set $S$ of mass $\varepsilon$. It reveals a phase transition in sample complexity for unknown truncation: when $\varepsilon\sqrt{\log(1/\varepsilon)}\lesssim\alpha$ standard testers achieve $n=\Theta(\sqrt{d})$, while in the regime $\varepsilon\lesssim\alpha\lesssim\varepsilon\sqrt{\log(1/\varepsilon)}$ the task requires $n=\Theta(d)$, i.e., as hard as learning; if $\alpha\lesssim\varepsilon$, testing is impossible. In contrast, when truncation is known, a gradient-based maximum likelihood tester attains the optimal $n=O(\sqrt{d})$ across all regimes. The results illuminate how truncation geometry and prior information fundamentally alter testing complexity and bridge connections to learning and robust statistics. The work thus provides precise, regime-dependent guidelines for efficient Gaussian mean testing in the presence of truncation, with implications for truncated data in economics and social sciences.
Abstract
We consider the task of Gaussian mean testing, that is, of testing whether a high-dimensional vector perturbed by white noise has large magnitude, or is the zero vector. This question, originating from the signal processing community, has recently seen a surge of interest from the machine learning and theoretical computer science community, and is by now fairly well understood. What is much less understood, and the focus of our work, is how to perform this task under truncation: that is, when the observations (i.i.d.\ samples from the underlying high-dimensional Gaussian) are only observed when they fall in an given subset of the domain $\R^d$. This truncation model, previously studied in the context of learning (instead of testing) the mean vector, has a range of applications, in particular in Economics and Social Sciences. As our work shows, sample truncations affect the complexity of the testing task in a rather subtle and surprising way.
