Table of Contents
Fetching ...

On Ranking-based Tests of Independence

Myrto Limnios, Stéphan Clémençon

Abstract

In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis $\mathcal{H}_0$ is necessarily false as soon as the optimal scoring function related to the pair of distributions $(H\otimes G,\; F)$, obtained from a bipartite ranking algorithm, has a ROC curve that deviates from the main diagonal of the unit square.We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence. Beyond its great flexibility, this new method has theoretical properties that far surpass those of its competitors. Nonasymptotic bounds for the two types of testing errors are established. From an empirical perspective, the novel procedure we promote in this paper exhibits a remarkable ability to detect small departures, of various types, from the null assumption $\mathcal{H}_0$, even in high dimension, as supported by the numerical experiments presented here.

On Ranking-based Tests of Independence

Abstract

In this paper we develop a novel nonparametric framework to test the independence of two random variables and with unknown respective marginals and and joint distribution , based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis is necessarily false as soon as the optimal scoring function related to the pair of distributions , obtained from a bipartite ranking algorithm, has a ROC curve that deviates from the main diagonal of the unit square.We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence. Beyond its great flexibility, this new method has theoretical properties that far surpass those of its competitors. Nonasymptotic bounds for the two types of testing errors are established. From an empirical perspective, the novel procedure we promote in this paper exhibits a remarkable ability to detect small departures, of various types, from the null assumption , even in high dimension, as supported by the numerical experiments presented here.
Paper Structure (43 sections, 7 theorems, 41 equations, 8 figures, 2 tables)

This paper contains 43 sections, 7 theorems, 41 equations, 8 figures, 2 tables.

Key Result

Theorem 1

The following assertions are equivalent. In addition, we have:

Figures (8)

  • Figure 1: Left: Joint Gaussian density for $\rho=0.20$ of $(X^1, Y^1)$. Right: Plots of the optimal $\rm ROC$ curves for two Gaussian vectors with linear correlation $\rho \in\{0.0, \; 0.05, \; 0.10, \; 0.15, \; 0.20\}$ and $q=l=5$.
  • Figure 2: Ranking-based independence rank test.
  • Figure 3: Plots of the rejection rate under ${\cal H}_0$ (a) and ${\cal H}_1$ (b-e) against the significance level $\alpha\in(0,1)$ for (GL) with $\phi(u)=u$ (rForest$_{MWW}$), $\rho = 0.0$ (a) $\rho = 0.2$ (b), $\rho = 0.3$ (c), $\rho = 0.4$ (d), $\rho = 0.5$ (e). The parameters are fixed to $N=1000$, $d=4$, $K_p=10$, $K_0=200$, $B=100$ for all experiments.
  • Figure 9: Plots of the rejection rate under ${\cal H}_0$ (a) and ${\cal H}_1$ (b-d) against the significance level $\alpha\in(0,1)$ for (GL) with $\phi(u)=u$ (rForest$_{MWW}$), $\rho = 0.0$ (a) $\rho = 0.10$ (b), $\rho = 0.15$ (c), $\rho = 0.20$ (d). The parameters are fixed to $N=1000$, $d=10$, $K_p=10$, $K_0=200$, $B=100$ for all experiments.
  • Figure 14: Boxplots of the $p$-values for different sets of protected attributes. The experimental parameters are fixed to $N=10^3$, $K_p =10$, $q=1, l\in \{2,3\}$, $5$-fold cross-validation, $31$ features, based on the open-source dataset available jesus2022turning.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Example 1
  • Proposition 1
  • Proposition 2
  • Definition 1
  • Definition 2
  • Theorem 2
  • Lemma 3
  • Proposition 3
  • Definition 3
  • ...and 2 more