Table of Contents
Fetching ...

Learning to refine domain knowledge for biological network inference

Peiwen Li, Menghua Wu

TL;DR

This work proposes an amortized algorithm for refining domain knowledge, based on data observations and shows that this approach outperforms baselines in recovering ground truth causal graphs and identifying errors in the prior knowledge with limited interventional data.

Abstract

Perturbation experiments allow biologists to discover causal relationships between variables of interest, but the sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms. Biological knowledge graphs can bootstrap the inference of causal structures in these situations, but since they compile vastly diverse information, they can bias predictions towards well-studied systems. Alternatively, amortized causal structure learning algorithms encode inductive biases through data simulation and train supervised models to recapitulate these synthetic graphs. However, realistically simulating biology is arguably even harder than understanding a specific system. In this work, we take inspiration from both strategies and propose an amortized algorithm for refining domain knowledge, based on data observations. On real and synthetic datasets, we show that our approach outperforms baselines in recovering ground truth causal graphs and identifying errors in the prior knowledge with limited interventional data.

Learning to refine domain knowledge for biological network inference

TL;DR

This work proposes an amortized algorithm for refining domain knowledge, based on data observations and shows that this approach outperforms baselines in recovering ground truth causal graphs and identifying errors in the prior knowledge with limited interventional data.

Abstract

Perturbation experiments allow biologists to discover causal relationships between variables of interest, but the sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms. Biological knowledge graphs can bootstrap the inference of causal structures in these situations, but since they compile vastly diverse information, they can bias predictions towards well-studied systems. Alternatively, amortized causal structure learning algorithms encode inductive biases through data simulation and train supervised models to recapitulate these synthetic graphs. However, realistically simulating biology is arguably even harder than understanding a specific system. In this work, we take inspiration from both strategies and propose an amortized algorithm for refining domain knowledge, based on data observations. On real and synthetic datasets, we show that our approach outperforms baselines in recovering ground truth causal graphs and identifying errors in the prior knowledge with limited interventional data.

Paper Structure

This paper contains 23 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Drawing inspiration from amortized causal discovery algorithms, we learn how to refine graph priors, enabling robust graph predictions in low-data regimes.
  • Figure 2: A) At inference, we use biological knowledge graphs as noisy graph priors, which we refine with perturbation data. B) We train an attention-based model to denoise simulated graph priors.
  • Figure 3: Visualization of ground truth Sachs consensus graph sachs and CORUM knowledge graph corum. Blue: Undirected edges present in both CORUM and Sachs. Orange: Undirected edges present in Sachs but not CORUM. 9 of 17 undirected edges in Sachs are present on the CORUM graph; of the 38 pairs of nodes that have no relationship in Sachs, 26 also have no relationship in CORUM.
  • Figure 4: Noise detection on synthetic datasets. Ours outperforms Dcdi at identifying errors in the graph prior, for both noise levels.
  • Figure 5: Runtime analysis of all algorithms. Amortized inference approaches (Sea, Ours) are orders of magnitude faster.