Detecting practically significant dependencies in metric spaces via distance correlations
Holger Dette, Marius Kroll
TL;DR
This work reframes independence testing for metric-space-valued data by targeting practically significant dependencies, formalized as $H_0^{rel}: dcor(X,Y) \le \Delta$ vs $H_1^{rel}: dcor(X,Y) > \Delta$. It develops a pivotal, self-normalized, sequential test based on a functional limit theorem for the sequential distance covariance process, valid under strictly stationary $\beta$-mixing data in spaces of strong negative type, and yields a data-adaptive threshold $\hat{\Delta}_\alpha$ with controlled type I error and consistent power. The methodology provides confidence intervals for $dcor$ without resampling and extends to non-Euclidean and functional data, including time-series settings such as ARMA and GARCH models; finite-sample results demonstrate robust performance even at moderate to large dimensions. The paper also analyzes edge cases (perfect independence) and outlines avenues for extensions, such as variable importance screening and bioequivalence-type testing, highlighting substantial computational and practical benefits over resampling-based approaches.
Abstract
We take a different look at the problem of testing the independence of two metric-space-valued random variables using the distance correlation. Instead of testing if the distance correlation vanishes exactly, we are interested in the hypothesis that it does not exceed a certain threshold. Our testing problem is motivated by the observation that in many cases it is more reasonable to test for a practically significant dependency since it is rare that a hypothesis of perfect independence is exactly satisfied. This point of view also reflects statistical practice, where one often classifies the strength of the association in categories such as `small', `medium' and `large' and the precise definitions depend on the specific application. To address these problems we develop a pivotal test for the hypothesis that the distance correlation between two random variables does not exceed a pre-specified threshold $Δ$. We also determine a minimum value $\hat Δ_α$ from the data such that the hypothesis is rejected for all $Δ\leq \hat Δ_α$ at controlled type I error $α$. This quantity can be interpreted as a measure of evidence against the hypothesis that the distance correlation is less or equal than $Δ$. The new test is applicable to processes taking values in separable metric spaces of strong negative type, covering Euclidean as well as functional data. We do not assume independent observations, and instead prove our results for absolutely regular sample generating processes, which includes many time series such as ARMA and GARCH models. Our approach is based on a new functional limit theorem for the sequential distance correlation process, and can also be used to construct confidence intervals for the distance correlation without the need for resampling.
