Distribution-free two-sample testing with blurred total variation distance
Rohan Hore, Rina Foygel Barber
TL;DR
This work tackles the challenge of distribution-free two-sample testing by introducing blurred total variation, a smoothing-based relaxation of the classical TV distance. It provides distribution-free lower and upper confidence bounds for blurred TV, along with Monte Carlo estimators and bandwidth-adaptive schemes that maintain validity without distributional assumptions. A key insight is that inference quality depends on intrinsic rather than ambient dimension, enabling meaningful guarantees when data lie on or near a low-dimensional structure. The approach offers practical tools for hypothesis testing and model evaluation in high-dimensional nonparametric settings, with proofs relegated to the appendix. Overall, blurred TV serves as a principled, tractable surrogate for TV that preserves interpretability while enabling assumption-free inference.
Abstract
Two-sample testing, where we aim to determine whether two distributions are equal or not equal based on samples from each one, is challenging if we cannot place assumptions on the properties of the two distributions. In particular, certifying equality of distributions, or even providing a tight upper bound on the total variation (TV) distance between the distributions, is impossible to achieve in a distribution-free regime. In this work, we examine the blurred TV distance, a relaxation of TV distance that enables us to perform inference without assumptions on the distributions. We provide theoretical guarantees for distribution-free upper and lower bounds on the blurred TV distance, and examine its properties in high dimensions.
