Table of Contents
Fetching ...

Causal Network Discovery from Interventional Count Data with Latent Linear DAGs

Yijiao Zhang, Hongzhe Li

Abstract

The increasing availability of interventional data offers new opportunities for causal discovery, with gene perturbation studies providing a prominent example. Such data are typically count-valued and subject to substantial measurement error arising from technical variability and latent state heterogeneity. Motivated by these challenges, we study identification and estimation in latent linear structural causal models for interventional count data. We propose a latent linear Gaussian directed acyclic graph (DAG) model with Poisson measurement error that explicitly separates the latent causal structure from the observed counts. Under a mean-shift intervention design, we establish population-level identifiability of the latent causal DAG. Building on these identification results, we develop an estimation procedure based on sparse inverse matrix estimation and provide theoretical guarantees on estimation error and finite-sample causal discovery. Simulation studies and applications to Perturb-seq data demonstrate the practical effectiveness of the proposed method.

Causal Network Discovery from Interventional Count Data with Latent Linear DAGs

Abstract

The increasing availability of interventional data offers new opportunities for causal discovery, with gene perturbation studies providing a prominent example. Such data are typically count-valued and subject to substantial measurement error arising from technical variability and latent state heterogeneity. Motivated by these challenges, we study identification and estimation in latent linear structural causal models for interventional count data. We propose a latent linear Gaussian directed acyclic graph (DAG) model with Poisson measurement error that explicitly separates the latent causal structure from the observed counts. Under a mean-shift intervention design, we establish population-level identifiability of the latent causal DAG. Building on these identification results, we develop an estimation procedure based on sparse inverse matrix estimation and provide theoretical guarantees on estimation error and finite-sample causal discovery. Simulation studies and applications to Perturb-seq data demonstrate the practical effectiveness of the proposed method.

Paper Structure

This paper contains 21 sections, 4 theorems, 19 equations, 5 figures, 1 algorithm.

Key Result

Theorem 3.1

Under the model eq:obs-model--eq:sparse-mean-shift, the causal mechanism $A$ is identifiable by jointly leveraging observational and interventional data, provided that each node is subject to at least one non-vanishing one-sparse mean-shift intervention in the latent expression model.

Figures (5)

  • Figure 1: F1 score of different methods under different graph densities, different mean shift strengths, and varying sample sizes of interventional data.
  • Figure 2: SHD of different methods under different graph densities, different mean shift strengths, and varying sample sizes of interventional data.
  • Figure 3: Precison-Recall curves of different methods on ChIP-seq dataset.
  • Figure 4: Distributional shifts quantified by $-\log_{10}$$p$-values from two-sample Kolmogorov--Smirnov (KS) tests comparing perturbed and control conditions. Upstream (ancestors) and downstream (descendants) genes are defined according to the estimated causal graph $\mathcal{G}(\widehat{A}_{\tau})$ obtained from obtained from seven different methods. The upper panel pools all gene--perturbation pairs and the lower panel summarizes medians at the perturbation level.
  • Figure 5: Gene regulatory network inferred by PLN-intervn ($\tau=0.2$). Node size indicates connectivity, and node color indicates the out-/in-degree ratio.

Theorems & Definitions (5)

  • Theorem 3.1
  • Lemma 5.1
  • Theorem 5.2: Estimation
  • Remark 5.1
  • Theorem 5.3: Graph Recovery