Table of Contents
Fetching ...

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

Renato Assunção, Flávio Figueiredo, Francisco N. Tinoco Júnior, Léo M. de Sá-Freire, Fábio Silva

TL;DR

This work introduces PREDEP, a fully non-parametric, asymmetric measure of predictive dependence for continuous pairs $(X,Y)$ that interprets dependence as the relative predictive loss from ignoring $X$. By extending Goodman–Kruskal’s idea to densities, PREDEP defines $S_Y$ and $S_{Y|X}$ and yields $\alpha_{Y|X}=(S_{Y|X}-S_Y)/S_{Y|X}$, with $\alpha\in[0,1]$, zero only under independence and a Gaussian link $\alpha=1-\sqrt{1-\rho^2}$. A bootstrap-based estimator leveraging a convolution-density interpretation enables practical deployment, and extensive experiments on 90k+ real and synthetic datasets show PREDEP captures non-linear and non-functional dependencies while offering a clear predictive interpretation, complementing existing measures like MIC, dCor, MI, HSIC, and CMI. The method is extendable to multivariate settings and provides actionable insights for exploratory analysis and potential causal discovery, thanks to its directional predictive emphasis and interpretability.

Abstract

A fundamental task in statistical learning is quantifying the joint dependence or association between two continuous random variables. We introduce a novel, fully non-parametric measure that assesses the degree of association between continuous variables $X$ and $Y$, capable of capturing a wide range of relationships, including non-functional ones. A key advantage of this measure is its interpretability: it quantifies the expected relative loss in predictive accuracy when the distribution of $X$ is ignored in predicting $Y$. This measure is bounded within the interval [0,1] and is equal to zero if and only if $X$ and $Y$ are independent. We evaluate the performance of our measure on over 90,000 real and synthetic datasets, benchmarking it against leading alternatives. Our results demonstrate that the proposed measure provides valuable insights into underlying relationships, particularly in cases where existing methods fail to capture important dependencies.

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

TL;DR

This work introduces PREDEP, a fully non-parametric, asymmetric measure of predictive dependence for continuous pairs that interprets dependence as the relative predictive loss from ignoring . By extending Goodman–Kruskal’s idea to densities, PREDEP defines and and yields , with , zero only under independence and a Gaussian link . A bootstrap-based estimator leveraging a convolution-density interpretation enables practical deployment, and extensive experiments on 90k+ real and synthetic datasets show PREDEP captures non-linear and non-functional dependencies while offering a clear predictive interpretation, complementing existing measures like MIC, dCor, MI, HSIC, and CMI. The method is extendable to multivariate settings and provides actionable insights for exploratory analysis and potential causal discovery, thanks to its directional predictive emphasis and interpretability.

Abstract

A fundamental task in statistical learning is quantifying the joint dependence or association between two continuous random variables. We introduce a novel, fully non-parametric measure that assesses the degree of association between continuous variables and , capable of capturing a wide range of relationships, including non-functional ones. A key advantage of this measure is its interpretability: it quantifies the expected relative loss in predictive accuracy when the distribution of is ignored in predicting . This measure is bounded within the interval [0,1] and is equal to zero if and only if and are independent. We evaluate the performance of our measure on over 90,000 real and synthetic datasets, benchmarking it against leading alternatives. Our results demonstrate that the proposed measure provides valuable insights into underlying relationships, particularly in cases where existing methods fail to capture important dependencies.
Paper Structure (12 sections, 21 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 21 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Left: Illustrative contingency table. Right: Regular grid on top of a scatterplot of a random sample $(x_k, y_k)$, $k=1, \ldots, N$ of the random vector $(X,Y)$ with joint density $f_{XY}(x,y)$.
  • Figure 2: Visualization of functional and non-functional relationships used for benchmarking.
  • Figure 3: Behavior of MIC and PREDEP in a functional relationship.
  • Figure 4: Behavior of MIC and PREDEP in non-functional relationships.
  • Figure 5: Application of PREDEP and MIC metrics to the 96,980 selected indicator pairs dataset.
  • ...and 5 more figures