Table of Contents
Fetching ...

Quantification and cross-fitting inference of asymmetric relations under generative exposure mapping models

Soumik Purkayastha, Peter X. -K. Song

TL;DR

The paper introduces an entropy-based asymmetry coefficient $C_{X\rightarrow Y}=H(X)-H(Y)$ within the generative exposure mapping (GEM) framework to infer directionality without requiring a pre-specified causal order. It develops robust estimation and inference via fast Fourier transform (FFT) density estimation and cross-fitting, and extends GEMs to noise-perturbed settings (NPGEM) with theoretical guarantees. Through simulations, it demonstrates consistency, asymptotic normality, and practical performance versus established causal-discovery methods, including in benchmark data. The method is applied to epigenetic data linking DNA methylation and blood pressure, revealing BP-to-DNAm directional pathways for several cardiovascular genes and illustrating GEM-induced asymmetry as a informative, data-driven imprint of underlying causality with broad discovery potential.

Abstract

Learning directionality between variables is crucial yet challenging, especially for mechanistic relationships without a priori ordering assumptions. We propose a coefficient of asymmetry to quantify directional asymmetry using Shannon's entropy within a generative exposure mapping (GEM) framework. GEMs arise from experiments where a generative function $g$ maps exposure $X$ to outcome $Y$ through $Y = g(X)$, extended to noise-perturbed GEMs as $Y = g(X) + ε$. Our approach considers a rich class of generative functions while providing statistical inference for uncertainty quantification - a gap in existing bivariate causal discovery techniques. We establish large-sample theoretical guarantees through data-splitting and cross-fitting techniques, implementing fast Fourier transformation-based density estimation to avoid parameter tuning. The methodology accommodates contamination in outcome measurements. Extensive simulations demonstrate superior performance compared to competing causal discovery methods. Applied to epigenetic data examining DNA methylation and blood pressure relationships, our method unveils novel pathways for cardiovascular disease genes \emph{FGF5} and \emph{HSD11B2}. This framework serves as a discovery tool for improving scientific research rigor, with GEM-induced asymmetry representing a low-dimensional imprint of underlying causality

Quantification and cross-fitting inference of asymmetric relations under generative exposure mapping models

TL;DR

The paper introduces an entropy-based asymmetry coefficient within the generative exposure mapping (GEM) framework to infer directionality without requiring a pre-specified causal order. It develops robust estimation and inference via fast Fourier transform (FFT) density estimation and cross-fitting, and extends GEMs to noise-perturbed settings (NPGEM) with theoretical guarantees. Through simulations, it demonstrates consistency, asymptotic normality, and practical performance versus established causal-discovery methods, including in benchmark data. The method is applied to epigenetic data linking DNA methylation and blood pressure, revealing BP-to-DNAm directional pathways for several cardiovascular genes and illustrating GEM-induced asymmetry as a informative, data-driven imprint of underlying causality with broad discovery potential.

Abstract

Learning directionality between variables is crucial yet challenging, especially for mechanistic relationships without a priori ordering assumptions. We propose a coefficient of asymmetry to quantify directional asymmetry using Shannon's entropy within a generative exposure mapping (GEM) framework. GEMs arise from experiments where a generative function maps exposure to outcome through , extended to noise-perturbed GEMs as . Our approach considers a rich class of generative functions while providing statistical inference for uncertainty quantification - a gap in existing bivariate causal discovery techniques. We establish large-sample theoretical guarantees through data-splitting and cross-fitting techniques, implementing fast Fourier transformation-based density estimation to avoid parameter tuning. The methodology accommodates contamination in outcome measurements. Extensive simulations demonstrate superior performance compared to competing causal discovery methods. Applied to epigenetic data examining DNA methylation and blood pressure relationships, our method unveils novel pathways for cardiovascular disease genes \emph{FGF5} and \emph{HSD11B2}. This framework serves as a discovery tool for improving scientific research rigor, with GEM-induced asymmetry representing a low-dimensional imprint of underlying causality
Paper Structure (24 sections, 6 theorems, 20 equations, 4 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 20 equations, 4 tables, 1 algorithm.

Key Result

Theorem 1

We consider a NPGEM given by (01_main_v2:eq:ANM_eps) with $g \in \mathcal{G}_{-}$, $f_Y$ and $H(Y)$ denoting the density and entropy of $Y$ respectively. Then, the upper-bound on the entropy of $Y^*$ is given by $H \left(Y^*\right) \leq H(Y) + \frac{1}{2} \log \left(\sigma^\prime I(Y) + 1 \right),$

Theorems & Definitions (17)

  • Definition 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Definition 2
  • Remark 5
  • Remark 6
  • Remark 7
  • Theorem 1
  • ...and 7 more