Integral-Operator-Based Spectral Algorithms for Goodness-of-Fit Tests
Shiwei Sang, Shao-Bo Lin, Xuehu Zhu
TL;DR
This work tackles GOF testing in finite-sample regimes by addressing the limitations of MMD through a broad spectral-regularization framework built on the integral operator $L_K$. It introduces a class of spectral-filtered kernel GOF statistics $\xi_{\lambda}(P,P_0)$ that generalize prior regularization (e.g., Tikhonov) and remove restrictive kernel assumptions, achieving non-asymptotic Type I control and improved detection power. Theoretical guarantees show a nonparametric detection boundary of order $n^{-\frac{4r}{4r+s}}$ under effective-dimension conditions, while numerical experiments corroborate finite-sample validity and robustness across various spectral filters. The proposed methods exhibit strong practical potential for GOF testing in high-dimensional or structured data settings, thanks to their flexibility, broad kernel applicability, and data-driven calibration schemes (empirical effective dimension or permutation-based).
Abstract
The widespread adoption of the \emph{maximum mean discrepancy} (MMD) in goodness-of-fit testing has spurred extensive research on its statistical performance. However, recent studies indicate that the inherent structure of MMD may constrain its ability to distinguish between distributions, leaving room for improvement. Regularization techniques have the potential to overcome this limitation by refining the discrepancy measure. In this paper, we introduce a family of regularized kernel-based discrepancy measures constructed via spectral filtering. Our framework can be regarded as a natural generalization of prior studies, removing restrictive assumptions on both kernel functions and filter functions, thereby broadening the methodological scope and the theoretical inclusiveness. We establish non-asymptotic guarantees showing that the resulting tests achieve valid Type~I error control and enhanced power performance. Numerical experiments are conducted to demonstrate the broader generality and competitive performance of the proposed tests compared with existing methods.
