Table of Contents
Fetching ...

A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics

Yijin Ni, Xiaoming Huo

TL;DR

The work develops a uniform concentration bound for kernel-based two-sample statistics that unifies ED, dCov, MMD, and HSIC under a single kernelized framework. It extends Maurer and Pontil's nonlinear-function concentration to two-sample settings, yielding explicit finite-sample and asymptotic error controls via Gaussian complexity and covering numbers. The results translate into concrete, nonasymptotic guarantees for dimension reduction, ICA, fairness constraints, generative modeling, and MMD-GANs, with kernel-specific verifications for common families. Overall, the approach provides a general, scalable tool for deriving performance guarantees in statistical learning problems that hinge on distributional discrepancies, enabling principled decisions about kernel choice and sample requirements.

Abstract

In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for distributions. Widely adopted examples of such a metric include Energy Distance (ED), distance Covariance (dCov), Maximum Mean Discrepancy (MMD), and the Hilbert-Schmidt Independence Criterion (HSIC). We show that these metrics can be unified under a general framework of kernel-based two-sample statistics. This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics. Our results provide upper bounds for estimation errors in the associated optimization problems, thereby offering both finite-sample and asymptotic performance guarantees. As illustrative applications, we demonstrate how these bounds facilitate the derivation of error bounds for procedures such as distance covariance-based dimension reduction, distance covariance-based independent component analysis, MMD-based fairness-constrained inference, MMD-based generative model search, and MMD-based generative adversarial networks.

A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics

TL;DR

The work develops a uniform concentration bound for kernel-based two-sample statistics that unifies ED, dCov, MMD, and HSIC under a single kernelized framework. It extends Maurer and Pontil's nonlinear-function concentration to two-sample settings, yielding explicit finite-sample and asymptotic error controls via Gaussian complexity and covering numbers. The results translate into concrete, nonasymptotic guarantees for dimension reduction, ICA, fairness constraints, generative modeling, and MMD-GANs, with kernel-specific verifications for common families. Overall, the approach provides a general, scalable tool for deriving performance guarantees in statistical learning problems that hinge on distributional discrepancies, enabling principled decisions about kernel choice and sample requirements.

Abstract

In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for distributions. Widely adopted examples of such a metric include Energy Distance (ED), distance Covariance (dCov), Maximum Mean Discrepancy (MMD), and the Hilbert-Schmidt Independence Criterion (HSIC). We show that these metrics can be unified under a general framework of kernel-based two-sample statistics. This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics. Our results provide upper bounds for estimation errors in the associated optimization problems, thereby offering both finite-sample and asymptotic performance guarantees. As illustrative applications, we demonstrate how these bounds facilitate the derivation of error bounds for procedures such as distance covariance-based dimension reduction, distance covariance-based independent component analysis, MMD-based fairness-constrained inference, MMD-based generative model search, and MMD-based generative adversarial networks.
Paper Structure (59 sections, 36 theorems, 251 equations, 1 table)

This paper contains 59 sections, 36 theorems, 251 equations, 1 table.

Key Result

Lemma 4

Let $f: \mathcal{X}^n \rightarrow \mathbb{R}$ be a function such that for all $i \in\{1, \ldots, n\}$, there exist $c_i<\infty$ for which Then, for all random variable $X$ embedded in set $\mathcal{X}$, let $\mathbf{X}:=(X_1, \dots, X_n)^T$, where $X_i \stackrel{\text{i.i.d.}}{\sim} X$, $\forall i$. We have, $\forall \delta \in (0, 1)$, where $f(\mathbf{X}) = (f(X_1), \dots, f(X_n))$.

Theorems & Definitions (44)

  • Definition 1: Maximum Mean Discrepancy
  • Definition 2: Empirical MMD Estimators, gretton2012kernel
  • Definition 3: Generalized MMD, fukumizu2009kernel
  • Lemma 4: McDiarmid's Inequality; McDiarmid (1989)
  • Definition 5: Rademacher Complexity
  • Lemma 6
  • Definition 8: Gaussian Complexity
  • Proposition 9: Gaussian and Rademacher Complexity
  • Lemma 10: [Maurer and Pontil (2019)], Theorem 2, Corollary 3
  • Theorem 12
  • ...and 34 more