Efficient Fine-Tuning of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction
Sepideh Maleki, Jan-Christian Huetter, Kangway V. Chuang, David Richmond, Gabriele Scalia, Tommaso Biancalani
TL;DR
This work tackles the challenge of predicting transcriptional responses to novel molecular perturbations in single-cell data under data scarcity. It introduces scDCA, a drug-conditional adapter that enables efficient, parameter-efficient fine-tuning of a frozen single-cell foundation model (scGPT) using molecule embeddings from a pre-trained molecular encoder (ChemBERTa). The approach achieves state-of-the-art performance on unseen drugs and unseen cell lines, demonstrated on the sciplex dataset with $R^2$-prominent gains and strong zero-shot generalization, while utilizing less than 1% of the base model parameters. The method has practical impact for drug discovery and cellular modeling by enabling robust, low-resource adaptation of large-scale single-cell models to chemical perturbations.
Abstract
Predicting transcriptional responses to novel drugs provides a unique opportunity to accelerate biomedical research and advance drug discovery efforts. However, the inherent complexity and high dimensionality of cellular responses, combined with the extremely limited available experimental data, makes the task challenging. In this study, we leverage single-cell foundation models (FMs) pre-trained on tens of millions of single cells, encompassing multiple cell types, states, and disease annotations, to address molecular perturbation prediction. We introduce a drug-conditional adapter that allows efficient fine-tuning by training less than 1% of the original foundation model, thus enabling molecular conditioning while preserving the rich biological representation learned during pre-training. The proposed strategy allows not only the prediction of cellular responses to novel drugs, but also the zero-shot generalization to unseen cell lines. We establish a robust evaluation framework to assess model performance across different generalization tasks, demonstrating state-of-the-art results across all settings, with significant improvements in the few-shot and zero-shot generalization to new cell lines compared to existing baselines.
