Table of Contents
Fetching ...

Efficient Fine-Tuning of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction

Sepideh Maleki, Jan-Christian Huetter, Kangway V. Chuang, David Richmond, Gabriele Scalia, Tommaso Biancalani

TL;DR

This work tackles the challenge of predicting transcriptional responses to novel molecular perturbations in single-cell data under data scarcity. It introduces scDCA, a drug-conditional adapter that enables efficient, parameter-efficient fine-tuning of a frozen single-cell foundation model (scGPT) using molecule embeddings from a pre-trained molecular encoder (ChemBERTa). The approach achieves state-of-the-art performance on unseen drugs and unseen cell lines, demonstrated on the sciplex dataset with $R^2$-prominent gains and strong zero-shot generalization, while utilizing less than 1% of the base model parameters. The method has practical impact for drug discovery and cellular modeling by enabling robust, low-resource adaptation of large-scale single-cell models to chemical perturbations.

Abstract

Predicting transcriptional responses to novel drugs provides a unique opportunity to accelerate biomedical research and advance drug discovery efforts. However, the inherent complexity and high dimensionality of cellular responses, combined with the extremely limited available experimental data, makes the task challenging. In this study, we leverage single-cell foundation models (FMs) pre-trained on tens of millions of single cells, encompassing multiple cell types, states, and disease annotations, to address molecular perturbation prediction. We introduce a drug-conditional adapter that allows efficient fine-tuning by training less than 1% of the original foundation model, thus enabling molecular conditioning while preserving the rich biological representation learned during pre-training. The proposed strategy allows not only the prediction of cellular responses to novel drugs, but also the zero-shot generalization to unseen cell lines. We establish a robust evaluation framework to assess model performance across different generalization tasks, demonstrating state-of-the-art results across all settings, with significant improvements in the few-shot and zero-shot generalization to new cell lines compared to existing baselines.

Efficient Fine-Tuning of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction

TL;DR

This work tackles the challenge of predicting transcriptional responses to novel molecular perturbations in single-cell data under data scarcity. It introduces scDCA, a drug-conditional adapter that enables efficient, parameter-efficient fine-tuning of a frozen single-cell foundation model (scGPT) using molecule embeddings from a pre-trained molecular encoder (ChemBERTa). The approach achieves state-of-the-art performance on unseen drugs and unseen cell lines, demonstrated on the sciplex dataset with -prominent gains and strong zero-shot generalization, while utilizing less than 1% of the base model parameters. The method has practical impact for drug discovery and cellular modeling by enabling robust, low-resource adaptation of large-scale single-cell models to chemical perturbations.

Abstract

Predicting transcriptional responses to novel drugs provides a unique opportunity to accelerate biomedical research and advance drug discovery efforts. However, the inherent complexity and high dimensionality of cellular responses, combined with the extremely limited available experimental data, makes the task challenging. In this study, we leverage single-cell foundation models (FMs) pre-trained on tens of millions of single cells, encompassing multiple cell types, states, and disease annotations, to address molecular perturbation prediction. We introduce a drug-conditional adapter that allows efficient fine-tuning by training less than 1% of the original foundation model, thus enabling molecular conditioning while preserving the rich biological representation learned during pre-training. The proposed strategy allows not only the prediction of cellular responses to novel drugs, but also the zero-shot generalization to unseen cell lines. We establish a robust evaluation framework to assess model performance across different generalization tasks, demonstrating state-of-the-art results across all settings, with significant improvements in the few-shot and zero-shot generalization to new cell lines compared to existing baselines.

Paper Structure

This paper contains 22 sections, 8 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: scDCA architecture. On the upper left, we show the input embedding to scGPT, which consists of gene tokens and unperturbed gene expressions. Input is passed through scGPT, which consists of stacked transformer blocks, where each layer incorporates a drug-conditional adapter module. Primary goal of this adapter is to introduce parameter-efficient fine-tuning by leveraging molecule embeddings to dynamically adjust biases of the down-projection and up-projection layers. Weights of the original transformer layers are frozen to reduce the number of trainable parameters.
  • Figure 2: Problem formulation. Rows represents different cell lines and columns different drugs. Green box shows the training sample and white box shows test. Previous works only focused on tasks (a), and (b).
  • Figure 3: Comparison with different baselines. scDCA is our proposed method. X-axis represents $R^2$ (Mean $\pm$ SE).
  • Figure 4: Examples of predicted gene expression across 20 most differentially expressed genes for the molecules Quisinostat and Dacinostat.
  • Figure 5: Clustering drugs based on their targets. X-axis represents targets.
  • ...and 4 more figures