Bias correction for Chatterjee's graph-based correlation coefficient
Mona Azadkia, Leihao Chen, Fang Han
TL;DR
This work analyzes the bias in Chatterjee's NN graph-based coefficient for measuring dependence, showing that the bias is negligible when the covariate dimension d ≤ 3 and proposing a regression-based bias-correction that extends root-n consistency to general settings. By developing a nonparametric ridge-regression framework to estimate the conditional mean function G_x(t) and form a bias estimator in U-statistic form, the authors prove that the bias-corrected estimator \\widehat{T}_n^{\\mathrm{bc}} is asymptotically normal with the same limiting variance as the original estimator. The approach combines a general bias-correction theory with ridge LS to control estimation error in expectation, enabling valid inference via analytical variance estimators or bootstrap. Simulations demonstrate improved finite-sample performance of the bias-corrected estimator in higher dimensions, highlighting its practical value for dependence testing in multivariate settings.
Abstract
Azadkia and Chatterjee (2021) recently introduced a simple nearest neighbor (NN) graph-based correlation coefficient that consistently detects both independence and functional dependence. Specifically, it approximates a measure of dependence that equals 0 if and only if the variables are independent, and 1 if and only if they are functionally dependent. However, this NN estimator includes a bias term that may vanish at a rate slower than root-$n$, preventing root-$n$ consistency in general. In this article, we (i) analyze this bias term closely and show that it could become asymptotically negligible when the dimension is smaller than four; and (ii) propose a bias-correction procedure for more general settings. In both regimes, we obtain estimators (either the original or the bias-corrected version) that are root-$n$ consistent and asymptotically normal.
