Table of Contents
Fetching ...

Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift

Camille Jimenez Cortes, Philippe Lalanda, German Vega

Abstract

Predicting drug response in patients from preclinical data remains a major challenge in precision oncology due to the substantial biological gap between in vitro cell lines and patient tumors. Rather than aiming to improve absolute in vitro prediction accuracy, this work examines whether explicitly separating representation learning from task supervision enables more sample-efficient adaptation of drug-response models to patient data under strong biological domain shift. We propose a staged transfer-learning framework in which cellular and drug representations are first learned independently from large collections of unlabeled pharmacogenomic data using autoencoder-based representation learning. These representations are then aligned with drug-response labels on cell-line data and subsequently adapted to patient tumors using few-shot supervision. Through a systematic evaluation spanning in-domain, cross-dataset, and patient-level settings, we show that unsupervised pretraining provides limited benefit when source and target domains overlap substantially, but yields clear gains when adapting to patient tumors with very limited labeled data. In particular, the proposed framework achieves faster performance improvements during few-shot patient-level adaptation while maintaining comparable accuracy to single-phase baselines on standard cell-line benchmarks. Overall, these results demonstrate that learning structured and transferable representations from unlabeled molecular profiles can substantially reduce the amount of clinical supervision required for effective drug-response prediction, offering a practical pathway toward data-efficient preclinical-to-clinical translation.

Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift

Abstract

Predicting drug response in patients from preclinical data remains a major challenge in precision oncology due to the substantial biological gap between in vitro cell lines and patient tumors. Rather than aiming to improve absolute in vitro prediction accuracy, this work examines whether explicitly separating representation learning from task supervision enables more sample-efficient adaptation of drug-response models to patient data under strong biological domain shift. We propose a staged transfer-learning framework in which cellular and drug representations are first learned independently from large collections of unlabeled pharmacogenomic data using autoencoder-based representation learning. These representations are then aligned with drug-response labels on cell-line data and subsequently adapted to patient tumors using few-shot supervision. Through a systematic evaluation spanning in-domain, cross-dataset, and patient-level settings, we show that unsupervised pretraining provides limited benefit when source and target domains overlap substantially, but yields clear gains when adapting to patient tumors with very limited labeled data. In particular, the proposed framework achieves faster performance improvements during few-shot patient-level adaptation while maintaining comparable accuracy to single-phase baselines on standard cell-line benchmarks. Overall, these results demonstrate that learning structured and transferable representations from unlabeled molecular profiles can substantially reduce the amount of clinical supervision required for effective drug-response prediction, offering a practical pathway toward data-efficient preclinical-to-clinical translation.
Paper Structure (23 sections, 6 figures)

This paper contains 23 sections, 6 figures.

Figures (6)

  • Figure 1: STaR-DR framework. Cell and drug features are independently encoded into latent representations and combined for drug-response prediction (DRP). Training includes unsupervised pretraining on CTRP–GDSC, supervised alignment on cell-line response data, and few-shot adaptation to TCGA.
  • Figure 2: In-domain cross-validation performance under leave-out protocols. Five-fold cross-validation on the CTRP--GDSC dataset using cell-line gene expression and mutation profiles combined with drug descriptors and Morgan fingerprints. Bars report mean $\pm$ s.d. balanced accuracy (left) and area under the precision--recall curve (AUPRC, right) across three evaluation settings: standard pair-level split (Baseline), Leave-Drug-Out (LDO), and Leave-Cell-Out (LCO).
  • Figure 3: Cross-dataset performance on CCLE. Models trained on CTRP--GDSC are evaluated on CCLE. (A--B) ROC (top) and Precision--Recall (bottom) curves averaged over 10 runs. (A) STAR-DR. (B) Single-phase baseline (AE-MLP). Shaded regions indicate $\pm$1 s.d. (C) Balanced accuracy (mean $\pm$ s.d.).
  • Figure 4: Few-shot adaptation to TCGA under strong domain shift. ROC--AUC (blue) and area under the precision--recall curve (AUPRC, orange) as a function of the number of labeled TCGA patient samples used for adaptation. (A) STaR-DR. (B) Single-phase baseline (AE-MLP). Curves report the mean over 5 independent runs; shaded bands indicate $\pm$1 standard deviation.
  • Figure 5: PCA of cell-line molecular profiles across datasets. Centroids are connected by Mahalanobis distances computed in PCA space (2D, pooled covariance). Distances are 0.597 between CTRP+GDSC and CCLE, 18.821 between CTRP+GDSC and TCGA, and 18.609 between CCLE and TCGA, based on gene expression and mutation profiles.
  • ...and 1 more figures