Azadkia-Chatterjee's dependence coefficient for infinite dimensional data
Siegfried Hörmann, Daniel Strenger
TL;DR
This work extends Azadkia-Chatterjee's dependence coefficient to covariates X in general metric spaces, including infinite-dimensional functional data, and analyzes the associated NN-based estimator. It reveals that nearest-neighbor degree can diverge polynomially in functional spaces, undermining standard asymptotics, and then establishes a data-dependent, self-normalized CLT for an independence test that remains universally consistent under mild conditions. The paper provides verifiable conditions for Gaussian functional data and demonstrates the method on Austrian municipalities' age-structure curves with COVID-19 vaccination data, showing strong dependence and favorable computational performance. These results offer guidance for applying graph-based dependence measures in infinite-dimensional settings and highlight the need to account for growing NN degrees in practice.
Abstract
We extend the scope of Azadkia-Chatterjee's dependence coefficient between a scalar response $Y$ and a multivariate covariate $X$ to the case where $X$ takes values in a general metric space. Particular attention is paid to the case where $X$ is a curve. Although extending this framework at the population level is relatively straightforward, analyzing the asymptotic behavior of the estimator proves to be complex. This complexity is largely related to the nearest neighbor structure of the infinite-dimensional covariate sample, leading us to explore a topic that has not been previously addressed in the literature. The primary contribution of this paper is to provide insights into this issue and propose strategies to address it. Our findings also have significant implications for other graph-based methods facing similar challenges.
