Likelihood-free hypothesis testing
Patrik Róbert Gerber, Yury Polyanskiy
TL;DR
This work analyzes likelihood-free hypothesis testing (LFHT), where hypotheses are only accessible via simulators, and derives a minimax trade-off between the number of simulations $n$ and real samples $m$ governed by nonparametric complexity $n_{GoF}(\epsilon)$. It shows that LFHT can achieve constant-error testing without fully estimating $\mathbb P_0,\mathbb P_1$ as long as $m\gg 1/\epsilon^2$ and $n$ meets the corresponding GoF/TS benchmarks, with the product constraint $mn$ scaling as $n_{GoF}^2(\epsilon)$. The central tool is Ingster’s $L^2$-distance test adapted to the LFHT setting via a unified projection-based statistic $T_{LF}$, together with reductions to GoF and two-sample testing to characterize the full region of feasibility across regular distribution classes ($\mathcal{P}_\sf{H}, \mathcal{P}_\sf{G}, \mathcal{P}_\sf{Db}, \mathcal{P}_\sf{D}$), plus robustness and Hellinger extensions. The results reveal a deep interpolation between goodness-of-fit, two-sample testing, and density estimation, and expose a phase transition for discrete distributions, providing a blueprint for testing without full distribution learning in high-complexity, simulator-based settings.
Abstract
Consider the problem of binary hypothesis testing. Given $Z$ coming from either $\mathbb P^{\otimes m}$ or $\mathbb Q^{\otimes m}$, to decide between the two with small probability of error it is sufficient, and in many cases necessary, to have $m\asymp1/ε^2$, where $ε$ measures the separation between $\mathbb P$ and $\mathbb Q$ in total variation ($\mathsf{TV}$). Achieving this, however, requires complete knowledge of the distributions and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem which we call likelihood-free hypothesis testing, where access to $\mathbb P$ and $\mathbb Q$ is given through $n$ i.i.d. observations from each. In the case when $\mathbb P$ and $\mathbb Q$ are assumed to belong to a non-parametric family, we demonstrate the existence of a fundamental trade-off between $n$ and $m$ given by $nm\asymp n_\sf{GoF}^2(ε)$, where $n_\sf{GoF}(ε)$ is the minimax sample complexity of testing between the hypotheses $H_0:\, \mathbb P=\mathbb Q$ vs $H_1:\, \mathsf{TV}(\mathbb P,\mathbb Q)\geqε$. We show this for three families of distributions, in addition to the family of all discrete distributions for which we obtain a more complicated trade-off exhibiting an additional phase-transition. Our results demonstrate the possibility of testing without fully estimating $\mathbb P$ and $\mathbb Q$, provided $m \gg 1/ε^2$.
