Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test
Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani
TL;DR
The paper introduces the Radon-Kolmogorov-Smirnov (RKS) test, a multivariate, higher-smoothness IPM built from Radon bounded variation (RBV) spaces. It proves a representer theorem showing the witness to the IPM is a ridge spline, equating the IPM optimization to training a two-layer neural network with a path-norm constraint; it also derives asymptotic null distributions and consistency results, and demonstrates competitive performance against MMD in diverse settings. Practically, RKS leverages neural-network optimization to approximate the IPM witness, enables permutation-based finite-sample calibration, and offers tail- and anisotropy-aware sensitivity through higher-order $k$. The work shows RKS generalizes KS to multiple dimensions with tunable smoothness, providing a complement to kernel-based tests for nonparametric two-sample problems and potential benefits for applications in anomaly detection, distribution shift, and robust generative modeling.
Abstract
Integral probability metrics (IPMs) constitute a general class of nonparametric two-sample tests that are based on maximizing the mean difference between samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the IPM defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness degree $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. We can thus leverage the power of modern neural network optimization toolkits to (approximately) maximize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test.
