Table of Contents
Fetching ...

A Kernel Distribution Closeness Testing

Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu

TL;DR

The paper tackles distribution closeness testing (DCT) for complex data by addressing MMD's insensitivity when RKHS norms differ. It introduces norm-adaptive MMD (NAMMD), which scales the MMD value by the RKHS norms, and develops NAMMD-based DCT with an analytically derived null distribution and asymptotic normality. The authors prove type-I error control, provide a sample- complexity bound, and show that NAMMD-based DCT can achieve higher power than MMD-based DCT. Empirically, NAMMD improves performance on synthetic data and real-world tasks such as domain adaptation and adversarial robustness, and it accommodates kernel-fusion strategies for enhanced accuracy.

Abstract

The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least $ε$-far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD's value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD's value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.

A Kernel Distribution Closeness Testing

TL;DR

The paper tackles distribution closeness testing (DCT) for complex data by addressing MMD's insensitivity when RKHS norms differ. It introduces norm-adaptive MMD (NAMMD), which scales the MMD value by the RKHS norms, and develops NAMMD-based DCT with an analytically derived null distribution and asymptotic normality. The authors prove type-I error control, provide a sample- complexity bound, and show that NAMMD-based DCT can achieve higher power than MMD-based DCT. Empirically, NAMMD improves performance on synthetic data and real-world tasks such as domain adaptation and adversarial robustness, and it accommodates kernel-fusion strategies for enhanced accuracy.

Abstract

The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least -far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD's value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD's value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.

Paper Structure

This paper contains 34 sections, 8 theorems, 140 equations, 9 figures, 7 tables, 2 algorithms.

Key Result

Lemma 2

If $\textnormal{NAMMD}(\mathbb{P},\mathbb{Q};\kappa)=\epsilon>0$, we have where $\sigma_{\mathbb{P},\mathbb{Q}}=\sqrt{4E[H_{1,2}H_{1,3}]-4(E[H_{1,2}])^2}/(4K-\|\bm{\mu}_\mathbb{P}\|_{\mathcal{H}_\kappa}^2-\|\bm{\mu}_\mathbb{Q}\|_{\mathcal{H}_\kappa}^2)$, and the expectation are taken over $\bm{x}_1,\bm{x}_2,\bm{x}_3\sim\mathbb{P}^3$ and $\bm{y}_1,\bm{y}_2,\bm{y}_3\sim\mat

Figures (9)

  • Figure 1: MMD is less informative when two distributions are different. All visualizations are presented with a constant MMD value $\|\bm{\mu}_\mathbb{P}-\bm{\mu}_\mathbb{Q}\|_{\mathcal{H}_\kappa}^2=0.15$ on the Gaussian kernel with bandwidth 1, extendable to other kernels of the form: $\kappa(\bm{x},\bm{x}')=\Psi(\bm{x}-\bm{x}')\leq K$ with $K>0$ for a positive-definite $\Psi(\cdot)$ and $\Psi(\bm{0})=K$ (Relevant Limitation Statement regarding kernel forms can be found in \ref{['app:Limits']}). Subfigures (a) and (b) depict distributions $\mathbb{P}$ and $\mathbb{Q}$ with varying norms ($\|\bm{\mu}_\mathbb{P}\|_{\mathcal{H}_\kappa}^2$ and $\|\bm{\mu}_\mathbb{Q}\|_{\mathcal{H}_\kappa}^2$), yet they yield the same MMD value in two subfigures, indicating that MMD is less informative. Subfigure (c) presents the MMD value and the $p$-values of its estimator in TST. Subfigure (d) presents the NAMMD value and the $p$-values of its estimator in TST. It is evident that NAMMD exhibits a stronger correlation with the $p$-value compared to MMD. Namely, larger NAMMD corresponds to smaller $p$-value, while MMD keeps the same value when the $p$-value changes.
  • Figure 2: The comparisons of test power vs sample size for our NAMMDFuse and SOTA two-sample tests.
  • Figure 3: Comparisons in distinguishing the closeness levels between the original and variants of ImageNet.
  • Figure 4: Comparison of NAMMD-based DCT and MMD-based DCT in detecting the confidence margin between ImageNet and ImageNetv2 datasets.
  • Figure 5: Comparison of the performance of NAMMD-based DCT and MMD-based DCT in detecting adversarial perturbations on the cifar10 dataset.
  • ...and 4 more figures

Theorems & Definitions (19)

  • Definition 1
  • Lemma 2
  • Definition 3
  • Lemma 4
  • Lemma 5
  • Theorem 6
  • Theorem 7
  • Definition 8
  • Theorem 9
  • Definition 10
  • ...and 9 more