Table of Contents
Fetching ...

An inferential measure of dependence between two systems using Bayesian model comparison

Guillaume Marrelec, Alain Giron

TL;DR

The paper presents a principled Bayesian framework to quantify dependence between two systems by comparing a joint independence model $H_0$ with a dependent model $H_1$, and defining the dependence measure $B(X,Y|D) = P(H_1|D)$ (or a strictly increasing function of it). It shows how $B$ relates to classical dependence notions such as mutual information and the log-likelihood ratio, and derives asymptotic behavior under known distributions, unknown-parameter likelihoods, nested models, and copula-based dependence; it also analyzes misspecification effects. Through extensive simulations and a real-life EEG application, the authors demonstrate that the log-posterior odds $\mathfrak{B}_{\mathrm{logr}}$ increases with sample size and true dependence strength, decreases under independence, and behaves consistently with an inferential measure of dependence. The framework provides an interpretable, model-based, data-driven measure of dependence that complements existing metrics by quantifying the evidence for dependence given a specified dependence model, while highlighting the importance of model choice and priors. Overall, the work bridges Bayesian model comparison with information-theoretic dependence concepts and offers a versatile toolkit for assessing dependence in diverse data settings.

Abstract

We propose to quantify dependence between two systems $X$ and $Y$ in a dataset $D$ based on the Bayesian comparison of two models: one, $H_0$, of statistical independence and another one, $H_1$, of dependence. In this framework, dependence between $X$ and $Y$ in $D$, denoted $B(X,Y|D)$, is quantified as $P(H_1|D)$, the posterior probability for the model of dependence given $D$, or any strictly increasing function thereof. It is therefore a measure of the evidence for dependence between $X$ and $Y$ as modeled by $H_1$ and observed in $D$. We review several statistical models and reconsider standard results in the light of $B(X,Y|D)$ as a measure of dependence. Using simulations, we focus on two specific issues: the effect of noise and the behavior of $B(X,Y|D)$ when $H_1$ has a parameter coding for the intensity of dependence. We then derive some general properties of $B(X,Y|D)$, showing that it quantifies the information contained in $D$ in favor of $H_1$ versus $H_0$. While some of these properties are typical of what is expected from a valid measure of dependence, others are novel and naturally appear as desired features for specific measures of dependence, which we call inferential. We finally put these results in perspective; in particular, we discuss the consequences of using the Bayesian framework as well as the similarities and differences between $B(X,Y|D)$ and mutual information.

An inferential measure of dependence between two systems using Bayesian model comparison

TL;DR

The paper presents a principled Bayesian framework to quantify dependence between two systems by comparing a joint independence model with a dependent model , and defining the dependence measure (or a strictly increasing function of it). It shows how relates to classical dependence notions such as mutual information and the log-likelihood ratio, and derives asymptotic behavior under known distributions, unknown-parameter likelihoods, nested models, and copula-based dependence; it also analyzes misspecification effects. Through extensive simulations and a real-life EEG application, the authors demonstrate that the log-posterior odds increases with sample size and true dependence strength, decreases under independence, and behaves consistently with an inferential measure of dependence. The framework provides an interpretable, model-based, data-driven measure of dependence that complements existing metrics by quantifying the evidence for dependence given a specified dependence model, while highlighting the importance of model choice and priors. Overall, the work bridges Bayesian model comparison with information-theoretic dependence concepts and offers a versatile toolkit for assessing dependence in diverse data settings.

Abstract

We propose to quantify dependence between two systems and in a dataset based on the Bayesian comparison of two models: one, , of statistical independence and another one, , of dependence. In this framework, dependence between and in , denoted , is quantified as , the posterior probability for the model of dependence given , or any strictly increasing function thereof. It is therefore a measure of the evidence for dependence between and as modeled by and observed in . We review several statistical models and reconsider standard results in the light of as a measure of dependence. Using simulations, we focus on two specific issues: the effect of noise and the behavior of when has a parameter coding for the intensity of dependence. We then derive some general properties of , showing that it quantifies the information contained in in favor of versus . While some of these properties are typical of what is expected from a valid measure of dependence, others are novel and naturally appear as desired features for specific measures of dependence, which we call inferential. We finally put these results in perspective; in particular, we discuss the consequences of using the Bayesian framework as well as the similarities and differences between and mutual information.

Paper Structure

This paper contains 34 sections, 27 equations, 5 figures.

Figures (5)

  • Figure 1: Simulation study: bivariate normal distribution with noise. Boxplots (median and $[ 25\%,75\%]$ percentile) of $\mathfrak{B}_{\mathrm{logr}} ( \mathcal{X}, \mathcal{Y} | D )$ in various conditions. Top left: Effect of $N$ and $\sigma^2$ for simulations with $\rho = 0$. Top right: Effect of $N$ and $\rho > 0$ for simulations with $\sigma^2 = 0.1$. Bottom left: Effect of prior $\mathrm{p} ( \rho | H_1 )$ for $\rho \in \{ -0.2, -0.1, 0, 0.1, 0.2 \}$, $\sigma^2 = 10^{-4}$ and $N = 200$. Bottom right: Effect of $\rho$ and $\sigma^2$ for datasets of size $N = 100$.
  • Figure 2: Simulation study: functional dependence with noise. Boxplots (median and $[ 25\%,75\%]$ percentile) of the effect of $\sigma^2$ and $N$ on $\mathfrak{B}_{\mathrm{logr}} ( \mathcal{X}, \mathcal{Y} | D )$ when the true model is either $H_1$ (top) or $H_0$ (bottom).
  • Figure 3: Simulation study: dependence through copula. Boxplots (median and $[ 25\%,75\%]$ percentile) of the effect of $\rho$ and $N$ on $\mathfrak{B}_{\mathrm{logr}} ( \mathcal{X}, \mathcal{Y} | D )$ when $H_1$ is true (top), and of the effect of $N$ when $H_0$ is true (bottom).
  • Figure 4: Simulation study: dependence of two chaotic systems. Boxplots (median and $[ 25\%,75\%]$ percentile) of the effect of the coupling parameter $C$ when $N= 50$ (top), and of the effect of $N$ with either $C = 1$, corresponding to $H_1$ true (middle) or with $C = 0$, corresponding to $H_0$ true (bottom).
  • Figure 5: Real-life application.$\mathfrak{B}_{\mathrm{logr}} ( \mathcal{X}, \mathcal{Y} | D )$ as a function of $\overline{R}$ for various values of $N$ (top) and as a function of $N$ for various values of $\overline{R}$ (midlle). Bottom panel: contour plot of $\mathfrak{B}_{\mathrm{logr}} ( \mathcal{X}, \mathcal{Y} | D )$, together with $[ N_0 ( \overline{R} ), \overline{R} ]$ (black solid line and circles) and $( N, 1/\sqrt{N} )$ (black dashed line).