A Kernel Distribution Closeness Testing

Zhijian Zhou; Liuhua Peng; Xunye Tian; Feng Liu

A Kernel Distribution Closeness Testing

Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu

TL;DR

The paper tackles distribution closeness testing (DCT) for complex data by addressing MMD's insensitivity when RKHS norms differ. It introduces norm-adaptive MMD (NAMMD), which scales the MMD value by the RKHS norms, and develops NAMMD-based DCT with an analytically derived null distribution and asymptotic normality. The authors prove type-I error control, provide a sample- complexity bound, and show that NAMMD-based DCT can achieve higher power than MMD-based DCT. Empirically, NAMMD improves performance on synthetic data and real-world tasks such as domain adaptation and adversarial robustness, and it accommodates kernel-fusion strategies for enhanced accuracy.

Abstract

The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least $ε$-far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD's value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD's value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.

A Kernel Distribution Closeness Testing

TL;DR

Abstract

The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least

-far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD's value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD's value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.

A Kernel Distribution Closeness Testing

TL;DR

Abstract

A Kernel Distribution Closeness Testing

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (19)