Table of Contents
Fetching ...

Automated Discovery of Pairwise Interactions from Unstructured Data

Zuheng, Xu, Moksh Jain, Ali Denton, Shawn Whitfield, Aniket Didolkar, Berton Earnshaw, Jason Hartford

TL;DR

Two interaction tests are derived based on pairwise interventions that are able to recover significantly more known biological interactions than random search and standard active learning baselines and validate on several synthetic and real biological experiments that these tests are able to identify interacting pairs effectively.

Abstract

Pairwise interactions between perturbations to a system can provide evidence for the causal dependencies of the underlying underlying mechanisms of a system. When observations are low dimensional, hand crafted measurements, detecting interactions amounts to simple statistical tests, but it is not obvious how to detect interactions between perturbations affecting latent variables. We derive two interaction tests that are based on pairwise interventions, and show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a more general notion of interaction than typical cell viability experiments, and can be run on cheaper experimental assays. We validate on several synthetic and real biological experiments that our tests are able to identify interacting pairs effectively. We evaluate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images. We show that we are able to recover significantly more known biological interactions than random search and standard active learning baselines.

Automated Discovery of Pairwise Interactions from Unstructured Data

TL;DR

Two interaction tests are derived based on pairwise interventions that are able to recover significantly more known biological interactions than random search and standard active learning baselines and validate on several synthetic and real biological experiments that these tests are able to identify interacting pairs effectively.

Abstract

Pairwise interactions between perturbations to a system can provide evidence for the causal dependencies of the underlying underlying mechanisms of a system. When observations are low dimensional, hand crafted measurements, detecting interactions amounts to simple statistical tests, but it is not obvious how to detect interactions between perturbations affecting latent variables. We derive two interaction tests that are based on pairwise interventions, and show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a more general notion of interaction than typical cell viability experiments, and can be run on cheaper experimental assays. We validate on several synthetic and real biological experiments that our tests are able to identify interacting pairs effectively. We evaluate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images. We show that we are able to recover significantly more known biological interactions than random search and standard active learning baselines.
Paper Structure (44 sections, 3 theorems, 45 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 44 sections, 3 theorems, 45 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.6

Suppose that assump:causaldagassump:diffeo hold, and that $\delta_i, \delta_j$ are separable. Then,

Figures (8)

  • Figure 1: The traditional experimental design loop involves a number of human expert decisions: the expert need to measure a carefully chosen feature of the experimental outcome (e.g. did a cell survive a perturbation?), formulate a prediction for the behaviour of this feature under the null hypothesis to test for interactions, and select interacting pairs for the combinatorial space of possible perturbations. Our approach enables automated selection of perturbants and testing for interactions directly from raw signal data (e.g. images of cells under a microscope).
  • Figure 2: Separability testing on both the synthetic tabular data using KNN-based KL estimator (left) and NRE-based KL estimator (middle), and the synthetic images (right); brighter colors suggest stronger interactions. Ground truth interacting pairs for both examples are A-B and C-D, which are correctly identified.
  • Figure 3: Disjointedness testing on synthetic example using the MMD-based statistics with a Matern 2.5 kernel (left) and an RBF kernel (right); brighter colors suggest stronger interactions. The ground truth interacting pairs are D-E and F-G, which are correctly identified by the test.
  • Figure 4: (left) Pairwise separability scores between different CRISPR guides of two genes, TSC2 and MTOR. Missing pairs means that the data for corresponding pairwise combination was not collected in our biological experiment. Guides targeting the same gene show high KL scores (red), while guides targeting different genes show low scores (blue). (right) Random samples of the actual single cell images used in these experiments. Note that detecting the presence or absence of an interaction is extremely difficult, even for trained experts.
  • Figure 5: Pairwise interaction scores from a selected set of $50$ genes using disjointedness test with Matern 2.5 kernel and RBF kernel (left and middle, respectively), and separability test (right); brighter colors suggest stronger interactions. Genes from the same pathway are ordered adjacently. The associated pathways of each selected gene are described in \ref{['tab:gene_indices_pathways']} of \ref{['apdx:genegene']}.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Remark 3.3
  • Definition 3.5
  • Theorem 3.6
  • Definition 3.7
  • Theorem 3.9
  • Definition C.1: Kernel mean embedding of probability measure
  • Proposition C.2: c.f Eq. (3.29) of muandet2017kernel
  • Definition C.3: Characteristic kernel