Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders
Gongxu Luo, Haoyue Dai, Loka Li, Chengqian Gao, Boyang Sun, Kun Zhang
TL;DR
Gene regulatory network inference is confounded by latent factors and pervasive selection bias, which can masquerade as regulatory dependencies. The authors introduce GISL, a nonparametric causal discovery algorithm that integrates observational and perturbation data to differentiate regulatory edges, latent confounders, and selection processes by leveraging perturbation symmetry and CI patterns within an augmented DAG. They prove identifiability of causal relations, selection, and latent confounders up to CI-pattern equivalence under standard assumptions, and validate GISL on synthetic data and real single-cell perturbation datasets, demonstrating improved precision and robustness over baseline methods. The work provides a principled framework for disentangling complex upstream processes in GRNI, enabling more reliable causal inferences and revealing non-regulatory mechanisms of selection and confounding that shape observed gene expression. Overall, GISL offers a practical path toward accurate interventional causal discovery in genomics, with potential impacts on targeted interventions and understanding cellular regulation under bias.
Abstract
Gene regulatory network inference (GRNI) aims to discover how genes causally regulate each other from gene expression data. It is well-known that statistical dependencies in observed data do not necessarily imply causation, as spurious dependencies may arise from latent confounders, such as non-coding RNAs. Numerous GRNI methods have thus been proposed to address this confounding issue. However, dependencies may also result from selection--only cells satisfying certain survival or inclusion criteria are observed--while these selection-induced spurious dependencies are frequently overlooked in gene expression data analyses. In this work, we show that such selection is ubiquitous and, when ignored or conflated with true regulations, can lead to flawed causal interpretation and misguided intervention recommendations. To address this challenge, a fundamental question arises: can we distinguish dependencies due to regulation, confounding, and crucially, selection? We show that gene perturbations offer a simple yet effective answer: selection-induced dependencies are symmetric under perturbation, while those from regulation or confounding are not. Building on this motivation, we propose GISL (Gene regulatory network Inference in the presence of Selection bias and Latent confounders), a principled algorithm that leverages perturbation data to uncover both true gene regulatory relations and non-regulatory mechanisms of selection and confounding up to the equivalence class. Experiments on synthetic and real-world gene expression data demonstrate the effectiveness of our method.
