A direct extension of Azadkia & Chatterjee's rank correlation to multi-response vectors
Jonathan Ansari, Sebastian Fuchs
TL;DR
This work directly generalizes Chatterjee's rank correlation ξ to multivariate responses by introducing a scale-invariant predictability measure $T$, built by converting a multivariate regression problem into a univariate conditional-dependence framework. $T$ satisfies the core axioms of a measure of predictability, the information gain inequality, and a conditional-independence characterization, while admitting a fast, nonparametric, rank-based estimator $T_n$ with asymptotic normality. A permutation-invariant variant $ar{T}$ extends applicability to unordered response components, and closed-form MVN results provide intuition on dependence structure. The authors leverage these properties to develop MFOCI, a model-free, tuning-parameter-free multivariate feature ordering and selection method that scales to multi-output data and demonstrates strong performance against existing approaches in simulations and real data. Overall, the framework enables efficient, interpretable quantification of dependence and robust variable selection for multi-outcome problems across domains.
Abstract
Recently, Chatterjee (2023) recognized the lack of a direct generalization of his rank correlation $ξ$ in Azadkia and Chatterjee (2021) to a multi-dimensional response vector. As a natural solution to this problem, we here propose an extension of $ξ$ that is applicable to a set of $q \geq 1$ response variables, where our approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation $ξ$ to it. Our novel measure $T$ quantifies the scale-invariant extent of functional dependence of a response vector $\mathbf{Y} = (Y_1,\dots,Y_q)$ on predictor variables $\mathbf{X} = (X_1, \dots,X_p)$, characterizes independence of $\mathbf{X}$ and $\mathbf{Y}$ as well as perfect dependence of $\mathbf{Y}$ on $\mathbf{X}$ and hence fulfills all the characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various invariance results for $T$ as well as a closed-form expression in multivariate normal models. Building upon the graph-based estimator for $ξ$ in Azadkia and Chatterjee (2021), we obtain a non-parametric, strongly consistent estimator for $T$ and show -- as a main contribution -- its asymptotic normality. Based on this estimator, we develop a model-free and rank-based feature ranking and forward feature selection for multiple-outcome data that works without any tuning parameters. Simulation results and real case studies illustrate $T$'s broad applicability.
