Table of Contents
Fetching ...

Integral-Operator-Based Spectral Algorithms for Goodness-of-Fit Tests

Shiwei Sang, Shao-Bo Lin, Xuehu Zhu

TL;DR

This work tackles GOF testing in finite-sample regimes by addressing the limitations of MMD through a broad spectral-regularization framework built on the integral operator $L_K$. It introduces a class of spectral-filtered kernel GOF statistics $\xi_{\lambda}(P,P_0)$ that generalize prior regularization (e.g., Tikhonov) and remove restrictive kernel assumptions, achieving non-asymptotic Type I control and improved detection power. Theoretical guarantees show a nonparametric detection boundary of order $n^{-\frac{4r}{4r+s}}$ under effective-dimension conditions, while numerical experiments corroborate finite-sample validity and robustness across various spectral filters. The proposed methods exhibit strong practical potential for GOF testing in high-dimensional or structured data settings, thanks to their flexibility, broad kernel applicability, and data-driven calibration schemes (empirical effective dimension or permutation-based).

Abstract

The widespread adoption of the \emph{maximum mean discrepancy} (MMD) in goodness-of-fit testing has spurred extensive research on its statistical performance. However, recent studies indicate that the inherent structure of MMD may constrain its ability to distinguish between distributions, leaving room for improvement. Regularization techniques have the potential to overcome this limitation by refining the discrepancy measure. In this paper, we introduce a family of regularized kernel-based discrepancy measures constructed via spectral filtering. Our framework can be regarded as a natural generalization of prior studies, removing restrictive assumptions on both kernel functions and filter functions, thereby broadening the methodological scope and the theoretical inclusiveness. We establish non-asymptotic guarantees showing that the resulting tests achieve valid Type~I error control and enhanced power performance. Numerical experiments are conducted to demonstrate the broader generality and competitive performance of the proposed tests compared with existing methods.

Integral-Operator-Based Spectral Algorithms for Goodness-of-Fit Tests

TL;DR

This work tackles GOF testing in finite-sample regimes by addressing the limitations of MMD through a broad spectral-regularization framework built on the integral operator . It introduces a class of spectral-filtered kernel GOF statistics that generalize prior regularization (e.g., Tikhonov) and remove restrictive kernel assumptions, achieving non-asymptotic Type I control and improved detection power. Theoretical guarantees show a nonparametric detection boundary of order under effective-dimension conditions, while numerical experiments corroborate finite-sample validity and robustness across various spectral filters. The proposed methods exhibit strong practical potential for GOF testing in high-dimensional or structured data settings, thanks to their flexibility, broad kernel applicability, and data-driven calibration schemes (empirical effective dimension or permutation-based).

Abstract

The widespread adoption of the \emph{maximum mean discrepancy} (MMD) in goodness-of-fit testing has spurred extensive research on its statistical performance. However, recent studies indicate that the inherent structure of MMD may constrain its ability to distinguish between distributions, leaving room for improvement. Regularization techniques have the potential to overcome this limitation by refining the discrepancy measure. In this paper, we introduce a family of regularized kernel-based discrepancy measures constructed via spectral filtering. Our framework can be regarded as a natural generalization of prior studies, removing restrictive assumptions on both kernel functions and filter functions, thereby broadening the methodological scope and the theoretical inclusiveness. We establish non-asymptotic guarantees showing that the resulting tests achieve valid Type~I error control and enhanced power performance. Numerical experiments are conducted to demonstrate the broader generality and competitive performance of the proposed tests compared with existing methods.

Paper Structure

This paper contains 23 sections, 19 theorems, 166 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Assume that equ: estimation error, equ: upper bound critical value and hold for any $P \in \mathcal{P}(\mathcal{C}, \rho, \Delta)$, then there holds

Figures (9)

  • Figure 1: Distributions of the test statistic under $H_0$ and $H_1$, and the trade-off between two types of errors.
  • Figure 2: Empirical variance comparison of our proposed statistic and the method in hagrass-gof-test under the null hypothesis, based on simulations with standard normal data from $N(0,I_d)$ using a Gaussian kernel function. Three subfigures illustrate the empirical variance variation with respect to sample size (fixing $d = 10$, $\lambda = 0.01$), dimension (fixing $n = m = 200$, $\lambda = 0.01$), and regularization parameter (fixing $n = m = 200$, $d = 10$). The estimates for the integral operator and the centralized covariance operator are both based on $200$ samples.
  • Figure 3: (a) $m(x)$ for $c = 0.8$; (b) Alternative density functions for various $c$ values.
  • Figure 4: Power comparison using Tikhonov regularization under different regularization parameters.
  • Figure 5: Power comparison across dimensions under Gaussian mean alternatives.
  • ...and 4 more figures

Theorems & Definitions (34)

  • Definition 1: Significance level
  • Definition 2: Detection boundary and optimality
  • Lemma 1
  • proof
  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Lemma 2
  • proof
  • Proposition 2: Sample error
  • ...and 24 more