Distribution-free two-sample testing with blurred total variation distance

Rohan Hore; Rina Foygel Barber

Distribution-free two-sample testing with blurred total variation distance

Rohan Hore, Rina Foygel Barber

TL;DR

This work tackles the challenge of distribution-free two-sample testing by introducing blurred total variation, a smoothing-based relaxation of the classical TV distance. It provides distribution-free lower and upper confidence bounds for blurred TV, along with Monte Carlo estimators and bandwidth-adaptive schemes that maintain validity without distributional assumptions. A key insight is that inference quality depends on intrinsic rather than ambient dimension, enabling meaningful guarantees when data lie on or near a low-dimensional structure. The approach offers practical tools for hypothesis testing and model evaluation in high-dimensional nonparametric settings, with proofs relegated to the appendix. Overall, blurred TV serves as a principled, tractable surrogate for TV that preserves interpretability while enabling assumption-free inference.

Abstract

Two-sample testing, where we aim to determine whether two distributions are equal or not equal based on samples from each one, is challenging if we cannot place assumptions on the properties of the two distributions. In particular, certifying equality of distributions, or even providing a tight upper bound on the total variation (TV) distance between the distributions, is impossible to achieve in a distribution-free regime. In this work, we examine the blurred TV distance, a relaxation of TV distance that enables us to perform inference without assumptions on the distributions. We provide theoretical guarantees for distribution-free upper and lower bounds on the blurred TV distance, and examine its properties in high dimensions.

Distribution-free two-sample testing with blurred total variation distance

TL;DR

Abstract

Paper Structure (47 sections, 18 theorems, 222 equations, 4 figures)

This paper contains 47 sections, 18 theorems, 222 equations, 4 figures.

Introduction
Total variation distance.
Non-trivial DF-UCB on total variation distance is impossible
The blurred total variation distance
Prior work on blurred TV.
Our contributions
Organization of the paper
Properties of blurred TV distance
The role of the bandwidth
Convergence of the empirical blurred TV
DF-UCB and DF-LCB for blurred TV
Monte Carlo approximation of empirical blurred TV
Uniform validity of confidence bounds across bandwidths
Bandwidth-adaptive confidence bounds at a fixed $h$
Numerical experiment
...and 32 more sections

Key Result

Theorem 1.1

Fix $\alpha\in[0,1]$, any $d\geq 1$, and any $n,m\geq 1$. Let $\hat{U}_\alpha$ be any (possibly randomized) distribution-free upper confidence bound for $\mathrm{d}_{\mathrm{TV}}(\cdot,\cdot)$. Then, for any pair of distributions $P,Q\in \mathcal{P}_d$ satisfying $\textnormal{atom}(P)\cap\textnormal

Figures (4)

Figure 1: Near-monotonic behavior of blurred TV with bandwidth $h$. Here $P=\mathcal{N}(1,1)$, $Q=\mathcal{N}(-1,1)$, with $\mathrm{d}_{\mathrm{TV}}(P,Q) = \Phi(1)-\Phi(-1) \approx 0.683$, marked with a $\star$ in the figure. $\psi$ is either the Gaussian kernel (left), or a multimodal kernel, given by a density of the mixture distribution $\tfrac{1}{3}\,\mathcal{N}(-4,1)+\tfrac{1}{3}\,\mathcal{N}(0,1)+\tfrac{1}{3}\,\mathcal{N}(4,1)$ (right).
Figure 2: Monte Carlo based confidence bounds on $\mathrm{d}_{\mathrm{TV}}^h(P,Q)$. In each plot, $\mathrm{d}_{\mathrm{TV}}(P,Q)$ is marked by a $\star$ symbol. See Section \ref{['sec:simulation']} for simulation details.
Figure 3: A visualization of the results of Section \ref{['sec:curse_of_dimensionality']}, for distributions $P,Q$ with bounded density on the unit ball.
Figure 4: Effect of dimension on the empirical blurred TV $\mathrm{d}_{\mathrm{TV}}^h(\widehat{P}_n,\widehat{Q}_m)$. See Section \ref{['sec:simulation_dimension']} for simulation details.

Theorems & Definitions (30)

Definition 1: Distribution-free confidence bounds
Theorem 1.1
Definition 2: Blurred total variation distance
Proposition 2.1
Proposition 2.2
Theorem 2.3
Proposition 2.4
Theorem 3.1
Theorem 3.2
Theorem 3.3
...and 20 more

Distribution-free two-sample testing with blurred total variation distance

TL;DR

Abstract

Distribution-free two-sample testing with blurred total variation distance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (30)