In-silico biological discovery with large perturbation models
Djordje Miladinovic, Tobias Höppe, Mathieu Chevalley, Andreas Georgiou, Lachlan Stuart, Arash Mehrjou, Marcus Bantscheff, Bernhard Schölkopf, Patrick Schwab
TL;DR
The paper presents the Large Perturbation Model (LPM), a decoder-only, PRC-disentangled framework that integrates heterogeneous perturbation experiments by learning separate embeddings for perturbations, readouts, and contexts. It demonstrates state-of-the-art performance in predicting unseen perturbation outcomes, identifying shared mechanisms across chemical and genetic perturbations, and enabling causal gene-gene network inference via imputed perturbations. The authors validate LPM’s utility through in silico PKD1 upregulation studies in autosomal dominant polycystic kidney disease and a retrospective clinical cohort, linking computational predictions to real-world outcomes while acknowledging limitations and the need for prospective validation. They also show that model performance scales with more data and contexts, supporting transfer learning across diverse perturbation screens and laying groundwork for accelerated, data-driven biological discovery. Overall, LPM offers a versatile, scalable approach to derive mechanistic insights and therapeutic hypotheses from pooled perturbation data, with potential to guide experiments and clinical decision-making.
Abstract
Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.
