Table of Contents
Fetching ...

Universally Consistent K-Sample Tests via Dependence Measures

Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

TL;DR

It is proved that independence tests achieve universally consistent k- sample testing and that k-sample statistics such as Energy and Maximum Mean Discrepancy (MMD) are precisely equivalent to Dcorr.

Abstract

The K-sample testing problem involves determining whether K groups of data points are each drawn from the same distribution. Analysis of variance is arguably the most classical method to test mean differences, along with several recent methods to test distributional differences. In this paper, we demonstrate the existence of a transformation that allows K-sample testing to be carried out using any dependence measure. Consequently, universally consistent K-sample testing can be achieved using a universally consistent dependence measure, such as distance correlation and the Hilbert-Schmidt independence criterion. This enables a wide range of dependence measures to be easily applied to K-sample testing.

Universally Consistent K-Sample Tests via Dependence Measures

TL;DR

It is proved that independence tests achieve universally consistent k- sample testing and that k-sample statistics such as Energy and Maximum Mean Discrepancy (MMD) are precisely equivalent to Dcorr.

Abstract

The K-sample testing problem involves determining whether K groups of data points are each drawn from the same distribution. Analysis of variance is arguably the most classical method to test mean differences, along with several recent methods to test distributional differences. In this paper, we demonstrate the existence of a transformation that allows K-sample testing to be carried out using any dependence measure. Consequently, universally consistent K-sample testing can be achieved using a universally consistent dependence measure, such as distance correlation and the Hilbert-Schmidt independence criterion. This enables a wide range of dependence measures to be easily applied to K-sample testing.

Paper Structure

This paper contains 10 sections, 9 theorems, 37 equations, 3 figures.

Key Result

Theorem 1

Given $K$ random variables $(U_1, U_2, \ldots,U_K)$. Let $V \in \mathbb{R}^{K}$ be the multinomial distribution of probability $(\pi_1, \pi_2, \ldots,\pi_K)$, where $\pi_k \in (0,1)$ and $\sum_{k=1}^{K}\pi_k=1$. Let $U$ be the following mixture distribution: where $V_k$ denotes the $k$th dimension of $V$. Then, $F_{UV} = F_U F_V$ if and only if $F_{U_1} = F_{U_2} = \cdots = F_{U_K}$.

Figures (3)

  • Figure 1: The figure compares the testing power of Anova, Dcov, and Hsic for three different Gaussian-simulated sample datasets.
  • Figure F1: This figure presents the K-sample testing power at a type-1 error level of $0.05$, using several universally consistent dependence measures, two linear correlations, and Manova, as the sample size increases, for 20 different distribution settings..
  • Figure F2: This figure visualizes the 2-dimensional distributions used in Figure \ref{['fig1']}. We sample $500$ points from the first distribution $F_{U_1}$ (black dots), then rotate 60 degrees clockwise to produce $F_{U_2}$ and 60 degrees counter-clockwise to produce $F_{U_3}$, marked by lighter dots in each case.

Theorems & Definitions (13)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 4
  • proof
  • Theorem 4
  • proof
  • Theorem 4
  • ...and 3 more