Nyström Kernel Stein Discrepancy
Florian Kalinke, Zoltan Szabo, Bharath K. Sriperumbudur
TL;DR
The paper tackles GoF testing with KSD when the target density is known only up to a normalizing constant, where standard quadratic-time estimators are impractical for large datasets. It introduces a Nyström-based KSD estimator that projects the KSD onto a low-rank subspace, achieving a runtime of $O(mn+m^3)$, with $m\ll n$, and proves $\sqrt{n}$-consistency under a classical sub-Gaussian assumption. The authors extend the analysis to unbounded Stein kernels and provide a Nyström-based wild bootstrap for scalable null distribution estimation, supported by theoretical guarantees (including decay-rate based corollaries) and extensive experiments showing speedups with competitive testing power. This work enables scalable GoF testing for large-scale and high-dimensional problems, including MCMC validation and evaluation of deep generative models, by marrying Nyström approximations with kernel Stein methods.
Abstract
Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with the flexibility of kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nyström-based KSD acceleration -- with runtime $\mathcal O\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nyström points -- , show its $\sqrt{n}$-consistency with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks. We also show the $\sqrt n$-consistency of the quadratic-time KSD estimator.
