Table of Contents
Fetching ...

Multi-omic Causal Discovery using Genotypes and Gene Expression

Stephen Asiedu, David Watson

TL;DR

Causal discovery in multi-omic data is challenged by high dimensionality and hidden confounders. The authors introduce GENESIS, a constraint-based, genotype-anchored approach that starts with an empty ancestrality matrix and uses marginal and conditional independence tests to orient edges, producing a partially oriented graph that guides downstream methods. Key contributions include deactivator/activator concepts, three CI-based inference rules, a grow-shrink Markov blanket procedure, and a polynomial-time complexity of $\mathcal{O}(d_Z d_X^2)$ for the oracle, with demonstrated improvements over FCI and GES on simulated data and a yeast cis-eQTL dataset. This Genotype-anchored preprocessing step narrows the search space to biologically plausible edges, enabling faster and more reliable causal discovery relevant to functional genomics, drug discovery, and precision medicine.

Abstract

Causal discovery in multi-omic datasets is crucial for understanding the bigger picture of gene regulatory mechanisms, but remains challenging due to high dimensionality, differentiation of direct from indirect relationships, and hidden confounders. We introduce GENESIS (GEne Network inference from Expression SIgnals and SNPs), a constraint-based algorithm that leverages the natural causal precedence of genotypes to infer ancestral relationships in transcriptomic data. Unlike traditional causal discovery methods that start with a fully connected graph, GENESIS initialises an empty ancestrality matrix and iteratively populates it with direct, indirect or non-causal relationships using a series of provably sound marginal and conditional independence tests. By integrating genotypes as fixed causal anchors, GENESIS provides a principled ``head start'' to classical causal discovery algorithms, restricting the search space to biologically plausible edges. We test GENESIS on synthetic and real-world genomic datasets. This framework offers a powerful avenue for uncovering causal pathways in complex traits, with promising applications to functional genomics, drug discovery, and precision medicine.

Multi-omic Causal Discovery using Genotypes and Gene Expression

TL;DR

Causal discovery in multi-omic data is challenged by high dimensionality and hidden confounders. The authors introduce GENESIS, a constraint-based, genotype-anchored approach that starts with an empty ancestrality matrix and uses marginal and conditional independence tests to orient edges, producing a partially oriented graph that guides downstream methods. Key contributions include deactivator/activator concepts, three CI-based inference rules, a grow-shrink Markov blanket procedure, and a polynomial-time complexity of for the oracle, with demonstrated improvements over FCI and GES on simulated data and a yeast cis-eQTL dataset. This Genotype-anchored preprocessing step narrows the search space to biologically plausible edges, enabling faster and more reliable causal discovery relevant to functional genomics, drug discovery, and precision medicine.

Abstract

Causal discovery in multi-omic datasets is crucial for understanding the bigger picture of gene regulatory mechanisms, but remains challenging due to high dimensionality, differentiation of direct from indirect relationships, and hidden confounders. We introduce GENESIS (GEne Network inference from Expression SIgnals and SNPs), a constraint-based algorithm that leverages the natural causal precedence of genotypes to infer ancestral relationships in transcriptomic data. Unlike traditional causal discovery methods that start with a fully connected graph, GENESIS initialises an empty ancestrality matrix and iteratively populates it with direct, indirect or non-causal relationships using a series of provably sound marginal and conditional independence tests. By integrating genotypes as fixed causal anchors, GENESIS provides a principled ``head start'' to classical causal discovery algorithms, restricting the search space to biologically plausible edges. We test GENESIS on synthetic and real-world genomic datasets. This framework offers a powerful avenue for uncovering causal pathways in complex traits, with promising applications to functional genomics, drug discovery, and precision medicine.

Paper Structure

This paper contains 8 sections, 2 theorems, 1 figure, 2 algorithms.

Key Result

Theorem 1

All inferences returned by GENESIS-Oracle hold in the true $\mathcal{G}_X$. Moreover, if $\mathbf M_{ij} = i \prec j$, then the set of combined Markov blankets $S = MB(X_i) \cup MB(X_j)$ is a valid adjustment set for $(X_i, X_j)$.

Figures (1)

  • Figure 1: Results on real world and simulated data

Theorems & Definitions (6)

  • Definition 1: Deactivator
  • Definition 2: Activator
  • Theorem 1: Soundness
  • Theorem 2: Complexity
  • proof
  • proof