Table of Contents
Fetching ...

Spectral Regularized Kernel Goodness-of-Fit Tests

Omar Hagrass, Bharath K. Sriperumbudur, Bing Li

TL;DR

The paper addresses goodness-of-fit testing in non-Euclidean spaces via RKHS embeddings and shows the classical MMD-based test is not minimax optimal under general conditions. It introduces a spectral regularization framework, defining $oldsymbol{ exteta}_{oldsymbol{\lambda}}$ with a general regularizer $g_{oldsymbol{\lambda}}$, which subsumes Tikhonov regularization and relaxes the zero-mean kernel requirement. The authors develop an Oracle test, a computable two-sample statistic (when $P_0$ can be sampled), and two data-driven tests (SRCT and SRPT) with adaptation over regularization and kernels, proving minimax optimality (up to log factors) for broad classes of alternatives characterized by eigen-decay rates. Empirical results on periodic-spline, Gaussian, and directional data show that SRCT and SRPT outperform MMD, energy, KS, and SR2T, and adaptivity further improves performance. The work advances practical, minimax-optimal goodness-of-fit testing for non-Euclidean data and lays groundwork for kernel-regularization approaches in related hypothesis testing tasks.

Abstract

Maximum mean discrepancy (MMD) has enjoyed a lot of success in many machine learning and statistical applications, including non-parametric hypothesis testing, because of its ability to handle non-Euclidean data. Recently, it has been demonstrated in Balasubramanian et al.(2021) that the goodness-of-fit test based on MMD is not minimax optimal while a Tikhonov regularized version of it is, for an appropriate choice of the regularization parameter. However, the results in Balasubramanian et al. (2021) are obtained under the restrictive assumptions of the mean element being zero, and the uniform boundedness condition on the eigenfunctions of the integral operator. Moreover, the test proposed in Balasubramanian et al. (2021) is not practical as it is not computable for many kernels. In this paper, we address these shortcomings and extend the results to general spectral regularizers that include Tikhonov regularization.

Spectral Regularized Kernel Goodness-of-Fit Tests

TL;DR

The paper addresses goodness-of-fit testing in non-Euclidean spaces via RKHS embeddings and shows the classical MMD-based test is not minimax optimal under general conditions. It introduces a spectral regularization framework, defining with a general regularizer , which subsumes Tikhonov regularization and relaxes the zero-mean kernel requirement. The authors develop an Oracle test, a computable two-sample statistic (when can be sampled), and two data-driven tests (SRCT and SRPT) with adaptation over regularization and kernels, proving minimax optimality (up to log factors) for broad classes of alternatives characterized by eigen-decay rates. Empirical results on periodic-spline, Gaussian, and directional data show that SRCT and SRPT outperform MMD, energy, KS, and SR2T, and adaptivity further improves performance. The work advances practical, minimax-optimal goodness-of-fit testing for non-Euclidean data and lays groundwork for kernel-regularization approaches in related hypothesis testing tasks.

Abstract

Maximum mean discrepancy (MMD) has enjoyed a lot of success in many machine learning and statistical applications, including non-parametric hypothesis testing, because of its ability to handle non-Euclidean data. Recently, it has been demonstrated in Balasubramanian et al.(2021) that the goodness-of-fit test based on MMD is not minimax optimal while a Tikhonov regularized version of it is, for an appropriate choice of the regularization parameter. However, the results in Balasubramanian et al. (2021) are obtained under the restrictive assumptions of the mean element being zero, and the uniform boundedness condition on the eigenfunctions of the integral operator. Moreover, the test proposed in Balasubramanian et al. (2021) is not practical as it is not computable for many kernels. In this paper, we address these shortcomings and extend the results to general spectral regularizers that include Tikhonov regularization.
Paper Structure (36 sections, 26 theorems, 193 equations, 6 figures)

This paper contains 36 sections, 26 theorems, 193 equations, 6 figures.

Key Result

Theorem 1

Let $n\geq 2$ and Then for any $\alpha>0$, $\delta>0$, $P_{H_0}\{\hat{D}_{\mathrm{MMD}}^2 \geq \gamma\} \leq \alpha,$ where $\gamma = \frac{4\kappa}{\sqrt{\alpha}n},$ and $c(\alpha,\delta)\asymp\max\{\alpha^{-1/2},\delta^{-1}\}.$ Furthermore if $\Delta_{n} < d_{\alpha}n^{\frac{-2\theta}{2\theta+1}}$ for some $d_{\alpha}>0$ and one of the following holds: (i) $\theta \geq \frac{1}{2}$, (ii) $\sup_

Figures (6)

  • Figure 1: Oracle test (M3D) and SRPT to test for uniformity using periodic spline kernel on $[0,1]$. $P$ denotes the degree of perturbation where large $P$ makes the alternative distribution (i.e., the perturbed uniform distribution) to be closer to the null (uniform distribution).
  • Figure 2: Power for Gaussian shift experiments with different $d$ and $s$ using $n=200$.
  • Figure 3: Power for Gaussian covariance scale experiments with different $d$ and $s$ using $n=200.$
  • Figure 4: Power for perturbed uniform distributions for $d=1$ ($n=500$) and $d=2$ ($n=2000$).
  • Figure 5: Power for von Mises-Fisher distribution with different concentration parameter $k$ and $s$ using $n=500.$
  • ...and 1 more figures

Theorems & Definitions (44)

  • Example 1: Uniform distribution on $[0,1]$
  • Example 2: Uniform distribution on $\mathbb{S}^2$
  • Example 3: Gaussian distribution with Mehler kernel on $\mathbb{R}$
  • Theorem 1: Separation boundary of MMD test
  • Remark 1
  • Theorem 2: Minimax separation boundary
  • Remark 2
  • Theorem 3: Critical region--Oracle
  • Theorem 4: Separation boundary--Oracle
  • Corollary 5: Polynomial decay--Oracle
  • ...and 34 more