TxPert: Leveraging Biochemical Relationships for Out-of-Distribution Transcriptomic Perturbation Prediction
Frederik Wenkel, Wilson Tu, Cassandra Masschelein, Hamed Shirzad, Cian Eastwood, Shawn T. Whitfield, Ihab Bendidi, Craig Russell, Liam Hodgson, Yassir El Mesbahi, Jiarui Ding, Marta M. Fay, Berton Earnshaw, Emmanuel Noutahi, Alisandra K. Denton
TL;DR
TxPert introduces a unified, knowledge-graph–guided framework for predicting transcriptomic perturbation effects under out-of-distribution conditions. It combines a basal state encoder with a perturbation encoder that leverages multiple gene–gene interaction networks, fusing information through latent transfer and decoding to forecast perturbation-induced expression across unseen single and combinatorial perturbations and across novel cell lines. The work presents rigorous metric design, extensive ablations, and a multi-graph benchmarking approach that demonstrates state-of-the-art performance and robust generalization, addressing prior concerns about foundation-models in perturbation biology. The framework provides a practical path toward scalable in silico perturbation prediction with potential to accelerate drug discovery, cross-context extrapolation, and personalized medicine, while outlining future directions in few-shot/active learning and expanded evaluation protocols.
Abstract
Accurately predicting cellular responses to genetic perturbations is essential for understanding disease mechanisms and designing effective therapies. Yet exhaustively exploring the space of possible perturbations (e.g., multi-gene perturbations or across tissues and cell types) is prohibitively expensive, motivating methods that can generalize to unseen conditions. In this work, we explore how knowledge graphs of gene-gene relationships can improve out-of-distribution (OOD) prediction across three challenging settings: unseen single perturbations; unseen double perturbations; and unseen cell lines. In particular, we present: (i) TxPert, a new state-of-the-art method that leverages multiple biological knowledge networks to predict transcriptional responses under OOD scenarios; (ii) an in-depth analysis demonstrating the impact of graphs, model architecture, and data on performance; and (iii) an expanded benchmarking framework that strengthens evaluation standards for perturbation modeling.
