Table of Contents
Fetching ...

Nyström Kernel Stein Discrepancy

Florian Kalinke, Zoltan Szabo, Bharath K. Sriperumbudur

TL;DR

The paper tackles GoF testing with KSD when the target density is known only up to a normalizing constant, where standard quadratic-time estimators are impractical for large datasets. It introduces a Nyström-based KSD estimator that projects the KSD onto a low-rank subspace, achieving a runtime of $O(mn+m^3)$, with $m\ll n$, and proves $\sqrt{n}$-consistency under a classical sub-Gaussian assumption. The authors extend the analysis to unbounded Stein kernels and provide a Nyström-based wild bootstrap for scalable null distribution estimation, supported by theoretical guarantees (including decay-rate based corollaries) and extensive experiments showing speedups with competitive testing power. This work enables scalable GoF testing for large-scale and high-dimensional problems, including MCMC validation and evaluation of deep generative models, by marrying Nyström approximations with kernel Stein methods.

Abstract

Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with the flexibility of kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nyström-based KSD acceleration -- with runtime $\mathcal O\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nyström points -- , show its $\sqrt{n}$-consistency with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks. We also show the $\sqrt n$-consistency of the quadratic-time KSD estimator.

Nyström Kernel Stein Discrepancy

TL;DR

The paper tackles GoF testing with KSD when the target density is known only up to a normalizing constant, where standard quadratic-time estimators are impractical for large datasets. It introduces a Nyström-based KSD estimator that projects the KSD onto a low-rank subspace, achieving a runtime of , with , and proves -consistency under a classical sub-Gaussian assumption. The authors extend the analysis to unbounded Stein kernels and provide a Nyström-based wild bootstrap for scalable null distribution estimation, supported by theoretical guarantees (including decay-rate based corollaries) and extensive experiments showing speedups with competitive testing power. This work enables scalable GoF testing for large-scale and high-dimensional problems, including MCMC validation and evaluation of deep generative models, by marrying Nyström approximations with kernel Stein methods.

Abstract

Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with the flexibility of kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nyström-based KSD acceleration -- with runtime for samples and Nyström points -- , show its -consistency with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks. We also show the -consistency of the quadratic-time KSD estimator.
Paper Structure (25 sections, 17 theorems, 48 equations, 3 figures)

This paper contains 25 sections, 17 theorems, 48 equations, 3 figures.

Key Result

Lemma 1

The squared KSD estimator eq:KSD-estimator takes the form where $\bm\beta_p = \frac{1}{n}\mathbf K_{h_p,m,n} \bm1_{n}\in \mathbb{R}^m$, Gram matrix $\mathbf K_{h_p,m,m} = \left[h_p\left(\tilde{\mathbf x}_i,\tilde{\mathbf x}_j\right)\right]_{i,j=1}^m \in \mathbb{R}^{m\times m}$, and $\mathbf K_{h_p,m,n} = \left[h_p\left(\tilde{\mathbf x}_i,{\mathbf x}_j\rig

Figures (3)

  • Figure 1: Comparison of goodness-of-fit tests w.r.t. their runtime and their power.
  • Figure 2: Runtime and power trade-off of the tested approximations.
  • Figure 3: Impact of different choices of factor $c$ for the number of Nyström samples $m=c\sqrt n$.

Theorems & Definitions (30)

  • Lemma 1: Nyström-KSD Estimator
  • Remark 1
  • Theorem 1: Bounded case
  • Example 1: KSD yields unbounded kernel
  • Remark 2
  • Example 2: Applicability of Assumption \ref{['ass:sub-gaussian']}
  • Theorem 2: Consistency of Nyström-KSD
  • Corollary 1
  • Theorem 3: Consistency of KSD
  • Example 3: Assumption $\left\|{\left\|{h_p\left(\cdot,{X}\right)}\right\|_{\mathcal{H}_{h_p}}}\right\|_{\psi_2}< \infty$
  • ...and 20 more