An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

Renato Assunção; Flávio Figueiredo; Francisco N. Tinoco Júnior; Léo M. de Sá-Freire; Fábio Silva

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

Renato Assunção, Flávio Figueiredo, Francisco N. Tinoco Júnior, Léo M. de Sá-Freire, Fábio Silva

TL;DR

This work introduces PREDEP, a fully non-parametric, asymmetric measure of predictive dependence for continuous pairs $(X,Y)$ that interprets dependence as the relative predictive loss from ignoring $X$. By extending Goodman–Kruskal’s idea to densities, PREDEP defines $S_Y$ and $S_{Y|X}$ and yields $\alpha_{Y|X}=(S_{Y|X}-S_Y)/S_{Y|X}$, with $\alpha\in[0,1]$, zero only under independence and a Gaussian link $\alpha=1-\sqrt{1-\rho^2}$. A bootstrap-based estimator leveraging a convolution-density interpretation enables practical deployment, and extensive experiments on 90k+ real and synthetic datasets show PREDEP captures non-linear and non-functional dependencies while offering a clear predictive interpretation, complementing existing measures like MIC, dCor, MI, HSIC, and CMI. The method is extendable to multivariate settings and provides actionable insights for exploratory analysis and potential causal discovery, thanks to its directional predictive emphasis and interpretability.

Abstract

A fundamental task in statistical learning is quantifying the joint dependence or association between two continuous random variables. We introduce a novel, fully non-parametric measure that assesses the degree of association between continuous variables $X$ and $Y$, capable of capturing a wide range of relationships, including non-functional ones. A key advantage of this measure is its interpretability: it quantifies the expected relative loss in predictive accuracy when the distribution of $X$ is ignored in predicting $Y$. This measure is bounded within the interval [0,1] and is equal to zero if and only if $X$ and $Y$ are independent. We evaluate the performance of our measure on over 90,000 real and synthetic datasets, benchmarking it against leading alternatives. Our results demonstrate that the proposed measure provides valuable insights into underlying relationships, particularly in cases where existing methods fail to capture important dependencies.

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

TL;DR

This work introduces PREDEP, a fully non-parametric, asymmetric measure of predictive dependence for continuous pairs

that interprets dependence as the relative predictive loss from ignoring

. By extending Goodman–Kruskal’s idea to densities, PREDEP defines

and

and yields

, with

, zero only under independence and a Gaussian link

. A bootstrap-based estimator leveraging a convolution-density interpretation enables practical deployment, and extensive experiments on 90k+ real and synthetic datasets show PREDEP captures non-linear and non-functional dependencies while offering a clear predictive interpretation, complementing existing measures like MIC, dCor, MI, HSIC, and CMI. The method is extendable to multivariate settings and provides actionable insights for exploratory analysis and potential causal discovery, thanks to its directional predictive emphasis and interpretability.

Abstract

and

, capable of capturing a wide range of relationships, including non-functional ones. A key advantage of this measure is its interpretability: it quantifies the expected relative loss in predictive accuracy when the distribution of

is ignored in predicting

. This measure is bounded within the interval [0,1] and is equal to zero if and only if

and

are independent. We evaluate the performance of our measure on over 90,000 real and synthetic datasets, benchmarking it against leading alternatives. Our results demonstrate that the proposed measure provides valuable insights into underlying relationships, particularly in cases where existing methods fail to capture important dependencies.

Paper Structure (12 sections, 21 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 21 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Related work
A new association measure for continuous variables
Properties of $\alpha$
Learning $\alpha$ from data
Empirical evaluation
Conclusions
Goodman-Kruskall association measure for categorical variables
Computing Predep from Data
Properties of $\alpha$: proofs and details
Performance with copula models
Analysis of the WHO dataset

Figures (10)

Figure 1: Left: Illustrative contingency table. Right: Regular grid on top of a scatterplot of a random sample $(x_k, y_k)$, $k=1, \ldots, N$ of the random vector $(X,Y)$ with joint density $f_{XY}(x,y)$.
Figure 2: Visualization of functional and non-functional relationships used for benchmarking.
Figure 3: Behavior of MIC and PREDEP in a functional relationship.
Figure 4: Behavior of MIC and PREDEP in non-functional relationships.
Figure 5: Application of PREDEP and MIC metrics to the 96,980 selected indicator pairs dataset.
...and 5 more figures

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

TL;DR

Abstract

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

Authors

TL;DR

Abstract

Table of Contents

Figures (10)