Table of Contents
Fetching ...

The exact region and an inequality between Chatterjee's and Spearman's rank correlations

Jonathan Ansari, Marcus Rockel

Abstract

The rank correlation ξ(X,Y), recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in [0,1], where 0 characterizes independence of X and Y, and 1 characterizes perfect dependence of Y on X. Unlike concordance measures such as Spearman's ρ, which capture the degree of positive or negative dependence, ξquantifies the strength of functional dependence. In this paper, we study the attainable set of pairs (ξ(X,Y),ρ(X,Y)). The resulting ξ-\r{ho}-region is a convex set whose boundary is characterized by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that ξ(X,Y)\leq|ρ}(X,Y)| whenever Y is stochastically increasing or decreasing in X, and we identify the maximal difference ρ(X,Y)-ξ(X,Y) as exactly 0.4. Our proofs rely on a convex optimization problem under various equality and inequality constraints, as well as on ordering properties for ξand ρ. Our results contribute to a better understanding of Chatterjee's rank correlation, which typically yields substantially smaller values than Spearman's ρwhen quantifying positive dependencies. In particular, when interpreting the values of Chatterjee's rank correlation on the scale of ρ, the quantity \sqrtξ appears to be more appropriate.

The exact region and an inequality between Chatterjee's and Spearman's rank correlations

Abstract

The rank correlation ξ(X,Y), recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in [0,1], where 0 characterizes independence of X and Y, and 1 characterizes perfect dependence of Y on X. Unlike concordance measures such as Spearman's ρ, which capture the degree of positive or negative dependence, ξquantifies the strength of functional dependence. In this paper, we study the attainable set of pairs (ξ(X,Y),ρ(X,Y)). The resulting ξ-\r{ho}-region is a convex set whose boundary is characterized by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that ξ(X,Y)\leq|ρ}(X,Y)| whenever Y is stochastically increasing or decreasing in X, and we identify the maximal difference ρ(X,Y)-ξ(X,Y) as exactly 0.4. Our proofs rely on a convex optimization problem under various equality and inequality constraints, as well as on ordering properties for ξand ρ. Our results contribute to a better understanding of Chatterjee's rank correlation, which typically yields substantially smaller values than Spearman's ρwhen quantifying positive dependencies. In particular, when interpreting the values of Chatterjee's rank correlation on the scale of ρ, the quantity \sqrtξ appears to be more appropriate.

Paper Structure

This paper contains 13 sections, 17 theorems, 50 equations, 5 figures, 1 table.

Key Result

Theorem 1

The exact $\xi$-$\rho$-region is where The set $\mathcal{R}$ is convex and its boundary is described by the copula family $(C_b)_{b\in{\mathbb{R}}\setminus \{0\}}$ defined in eq:C_x and defcopbneg. More precisely, for $x\in (0,1)$ and for $b = b_x$ in eq:b_and_M, the copula $C_{\pm b}$ is the unique copula $C\in \mathcal{C}$ with $\xi(C)

Figures (5)

  • Figure 1: Illustration of the exact $\xi$-$\rho$-region $\mathcal{R}$ (transposed) in \ref{['eqxirhoregion']}, where stochastically increasing (decreasing) copulas are located in the right (left) scattered area; see Theorem \ref{['thm:rho_ge_xi']}. The right and left boundary $(\pm M_\xi,\xi)$ of the region is described by the copula family $(C_b)_{b\in {\mathbb{R}}\setminus\{0\}}$ with limiting cases $\Pi(u,v) := uv$ for $b\to 0,$$M(u,v):= \min\{u,v\}$ for $b\to \infty,$ and $W(u,v) := \max\{u+v - 1,0\}$ for $b\to -\infty;$ see Theorem \ref{['thexirhooptimisation']} and Remark \ref{['remsymm']}\ref{['remsymma']}.
  • Figure 2: Left: $(\rho, \xi)$ (solid) and $(\rho, \sqrt{\xi})$ (dotted) for classical stochastically increasing copula families. Symmetric copula families are more similar to the boundary copula $(C_b)_{b\in\mathbb{R}\setminus\{0\}}$, whereas the asymmetric Marshall-Olkin copula with $\alpha_1=1$ and varying $\alpha_2$ is closer to the diagonal. It in particular is an example of an SI copula family where in general $\sqrt{\xi} \not\le \rho$. Right: Comparison of rank correlations for well known SI copula families. The curves show the difference between Spearman's rho (solid lines) or Kendall's tau (dashed lines) and Chatterjee's xi, plotted against Chatterjee's xi. The solid blue curve represent the copula family $(C_b)_{b>0}$ defined in \ref{['eq:C_x']}, which yields the maximum possible difference between Spearman's rho and Chatterjee's xi.
  • Figure 3: The class of bivariate copulas with level sets of Spearman’s $\rho$ (left) and Chatterjee’s $\xi$ (right) around the independence copula $\Pi.$ Constant values of $\rho$ appear as horizontal stripes; the extremes $\rho=\pm1$ occur only at the Fréchet copulas $M$ and $W.$ In contrast, constant values of $\xi$ appear as concentric circles centred at $\Pi,$ expanding from $\xi=0$ (independence) to $\xi=1$ (perfect directed dependence). The blue and red lenses highlight stochastically increasing (SI) and decreasing (SD) copulas, respectively, which form convex subsets of the set of all copulas $\mathcal{C}.$
  • Figure 4: Comparison of Spearman's rank correlation $\rho$ and Chatterjee's rank correlation $\xi$ across different functional dependencies and noise levels. The first row corresponds to the linear model $Y_1 = r X + (1-r) \varepsilon$, the second row to the quadratic model $Y_2 = r X^2 + (1-r) \varepsilon$, and the third row to the sinusoidal model $Y_3 = r(4\sin(-X^2) + (2-X)(2+X)\mathds{1}_{\{X>0\}}) + (1-r) \varepsilon$, where $X$ is uniform on $(-\pi,\pi)$ and independent from the standard normal error $\varepsilon$. The columns correspond to an increasing signal parameter $r \in \{0.1, 0.4, 0.9\}$ and decreasing noise. The figure illustrates that $\rho$ effectively detects the strength of monotonic (in particular, linear) dependence (row 1). For non-monotonic dependence, it may attain the value zero as illustrated in rows 2 and 3. In contrast, $\xi$ increases with $r$ across all models, reflecting the strength of the functional dependence regardless of monotonicity.
  • Figure 5: The density $c_b$ (left) and the derivative $t\mapsto h_v(t) = \partial_1 C_b(t,v)$ (right) of the copula $C_b$ for $b=0.5$ (top), $b=1$ (middle) and $b=5$ (bottom). $s_v$ is the (hypothetical) upper boundary of the band and $a_v$ is the (hypothetical) lower boundary of the band, visualised in the densities for $v=0.6.$ The support of the densities on the left is just the intersection of the diagonal band $S$ in \ref{['eq:diag_band']} and the unit square $[0,1]^2$. Note that the density is zero outside the band; see \ref{['eq:density_cb']}.

Theorems & Definitions (36)

  • Theorem 1: Exact $\xi$-$\rho$-region
  • Corollary 1: Global $\rho-\xi$-maximum
  • Theorem 2: SI implies $\xi\le \rho$
  • Remark 1
  • Example 1: PLOD is not sufficient for $\xi(C)\le \rho(C)$
  • Lemma 1: A characterisation of copulas
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 26 more