Table of Contents
Fetching ...

Bias correction for Chatterjee's graph-based correlation coefficient

Mona Azadkia, Leihao Chen, Fang Han

TL;DR

This work analyzes the bias in Chatterjee's NN graph-based coefficient for measuring dependence, showing that the bias is negligible when the covariate dimension d ≤ 3 and proposing a regression-based bias-correction that extends root-n consistency to general settings. By developing a nonparametric ridge-regression framework to estimate the conditional mean function G_x(t) and form a bias estimator in U-statistic form, the authors prove that the bias-corrected estimator \\widehat{T}_n^{\\mathrm{bc}} is asymptotically normal with the same limiting variance as the original estimator. The approach combines a general bias-correction theory with ridge LS to control estimation error in expectation, enabling valid inference via analytical variance estimators or bootstrap. Simulations demonstrate improved finite-sample performance of the bias-corrected estimator in higher dimensions, highlighting its practical value for dependence testing in multivariate settings.

Abstract

Azadkia and Chatterjee (2021) recently introduced a simple nearest neighbor (NN) graph-based correlation coefficient that consistently detects both independence and functional dependence. Specifically, it approximates a measure of dependence that equals 0 if and only if the variables are independent, and 1 if and only if they are functionally dependent. However, this NN estimator includes a bias term that may vanish at a rate slower than root-$n$, preventing root-$n$ consistency in general. In this article, we (i) analyze this bias term closely and show that it could become asymptotically negligible when the dimension is smaller than four; and (ii) propose a bias-correction procedure for more general settings. In both regimes, we obtain estimators (either the original or the bias-corrected version) that are root-$n$ consistent and asymptotically normal.

Bias correction for Chatterjee's graph-based correlation coefficient

TL;DR

This work analyzes the bias in Chatterjee's NN graph-based coefficient for measuring dependence, showing that the bias is negligible when the covariate dimension d ≤ 3 and proposing a regression-based bias-correction that extends root-n consistency to general settings. By developing a nonparametric ridge-regression framework to estimate the conditional mean function G_x(t) and form a bias estimator in U-statistic form, the authors prove that the bias-corrected estimator \\widehat{T}_n^{\\mathrm{bc}} is asymptotically normal with the same limiting variance as the original estimator. The approach combines a general bias-correction theory with ridge LS to control estimation error in expectation, enabling valid inference via analytical variance estimators or bootstrap. Simulations demonstrate improved finite-sample performance of the bias-corrected estimator in higher dimensions, highlighting its practical value for dependence testing in multivariate settings.

Abstract

Azadkia and Chatterjee (2021) recently introduced a simple nearest neighbor (NN) graph-based correlation coefficient that consistently detects both independence and functional dependence. Specifically, it approximates a measure of dependence that equals 0 if and only if the variables are independent, and 1 if and only if they are functionally dependent. However, this NN estimator includes a bias term that may vanish at a rate slower than root-, preventing root- consistency in general. In this article, we (i) analyze this bias term closely and show that it could become asymptotically negligible when the dimension is smaller than four; and (ii) propose a bias-correction procedure for more general settings. In both regimes, we obtain estimators (either the original or the bias-corrected version) that are root- consistent and asymptotically normal.

Paper Structure

This paper contains 13 sections, 20 theorems, 255 equations, 5 tables.

Key Result

Theorem 2.1

Assume Assumption asump:dgp.

Theorems & Definitions (39)

  • Theorem 2.1
  • Lemma 3.1: Equation (3.1), lin2022limit
  • Theorem 3.1: Bias rate
  • Theorem 3.2: Bias expansion
  • Theorem 4.1: Main result
  • Remark 4.1
  • Theorem 4.2: Uniform approximation rate
  • proof : Proof of \ref{['thm:bias_rate']}
  • proof : Proof of \ref{['thm:bias_expansion']}
  • Lemma 6.1
  • ...and 29 more