Nonparametric inference for ratios of densities via uniformly valid and powerful permutation tests
Alberto Bordino, Thomas B. Berrett
TL;DR
This work tackles nonparametric testing of density ratios by introducing a density ratio permutation test that remains uniformly valid under a simple null $H_0:g\propto r f$ through weight-aware, nonuniform permutation schemes. It unifies IPM-based discrepancy measures with kernel methods, introducing the shifted maximum mean discrepancy $\mathrm{MMD}_{r,k}$ to handle density-ratio shifts, and proves consistency and minimax optimality under Sobolev smoothness with bandwidth-appropriate scaling $\zeta = n^{2/(4s+d)}$. The framework extends to unknown ratios via training-data estimation and to conditional testing for covariate-shift and related transfer-learning causal-inference tasks, with finite-sample guarantees that tie type I error to estimation error through total-variation bounds. The authors validate theory with extensive simulations and real-data applications (e.g., New York frisk, Stroop, Two Moons, diamonds), and provide practical software (DRPTR-DRPT). Overall, the paper delivers a versatile, statistically rigorous tool for nonparametric density-ratio inference applicable to distributional shifts, transfer learning diagnostics, and causal-inference diagnostics, with strong theoretical guarantees and empirical support.
Abstract
We propose the density ratio permutation test, a hypothesis test that assesses whether the ratio between two densities is proportional to a known function based on independent samples from each distribution. The test uses an efficient Markov Chain Monte Carlo scheme to draw weighted permutations of the pooled data, yielding exchangeable samples and finite sample validity. For power, if the statistic is an integral probability metric, our procedure is consistent under mild assumptions on the defining function class; specializing to a reproducing kernel Hilbert space, we introduce the shifted maximum mean discrepancy and prove minimax optimality of our test when a normalized difference between the densities lies in a Sobolev ball. We extend to the case of an unknown density ratio by estimating it on an independent training sample and derive type~I error bounds in terms of the estimation error as well as power results. This allows adapting our method to conditional two sample testing, making it a versatile tool for assessing covariate-shift and related assumptions, which frequently arise in transfer learning and causal inference. Finally, we validate our theoretical findings through experiments on both simulated and real-world datasets.
