Minimax Optimal Goodness-of-Fit Testing with Kernel Stein Discrepancy
Omar Hagrass, Bharath Sriperumbudur, Krishnakumar Balasubramanian
TL;DR
This paper addresses the problem of establishing minimax-optimal goodness-of-fit tests on general domains by leveraging kernel Stein discrepancy (KSD). It introduces an operator-theoretic representation of $D_{\mathrm{KSD}}$ and proposes a spectral-regularized discrepancy $D^2_{\lambda}$, along with a practical estimator $\hat{\mathbb S}_{\lambda}^{P}$ that uses data without requiring full knowledge of the null or extra null samples. The authors prove minimax optimal separation radii $\Delta_n$ under both polynomial and exponential eigenvalue decays, show that unregularized KSD is suboptimal, and provide an adaptive union test across $\lambda$ that achieves minimax optimality up to a $\log\log$ factor. Extensive experiments on Euclidean, spherical, and infinite-dimensional domains demonstrate the empirical superiority of the regularized KSD tests over unregularized variants, while remaining computationally efficient via a U-statistic framework and wild bootstrap thresholds. The work thus offers robust, domain-agnostic goodness-of-fit testing with strong theoretical guarantees and practical adaptability.
Abstract
We explore the minimax optimality of goodness-of-fit tests on general domains using the kernelized Stein discrepancy (KSD). The KSD framework offers a flexible approach for goodness-of-fit testing, avoiding strong distributional assumptions, accommodating diverse data structures beyond Euclidean spaces, and relying only on partial knowledge of the reference distribution, while maintaining computational efficiency. Although KSD is a powerful framework for goodness-of-fit testing, only the consistency of the corresponding tests has been established so far, and their statistical optimality remains largely unexplored. In this paper, we develop a general framework and an operator-theoretic representation of the KSD, encompassing many existing KSD tests in the literature, which vary depending on the domain. Building on this representation, we propose a modified discrepancy by applying the concept of spectral regularization to the KSD framework. We establish the minimax optimality of the proposed regularized test for a wide range of the smoothness parameter $θ$ under a specific alternative space, defined over general domains, using the $χ^2$-divergence as the separation metric. In contrast, we demonstrate that the unregularized KSD test fails to achieve the minimax separation rate for the considered alternative space. Additionally, we introduce an adaptive test capable of achieving minimax optimality up to a logarithmic factor by adapting to unknown parameters. Through numerical experiments, we illustrate the superior performance of our proposed tests across various domains compared to their unregularized counterparts.
