CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs
Yiqing Zhang, Xiaozhong Liu, Fabricio Murai
TL;DR
CLaDMoP tackles clinical trial outcome prediction under limited labeled data by combining a large language model branch for eligibility criteria with a lightweight drug-molecule branch, connected through multi-level fusion and grouping blocks. It introduces the Successful Clinical Trials (SCT) dataset and uses a pair-matching contrastive pretraining objective ($\mathcal{L}_{\mathrm{InfoNCE}}$) while freezing the LLM, followed by PEFT-based fine-tuning with LoRA to achieve strong performance on the Trial Outcome Prediction (TOP) benchmark. The approach yields up to $10.5\%$ PR-AUC and $3.6\%$ ROC-AUC gains over baselines, with notable improvements in Phase I/II and good generalization to new diseases. These results demonstrate that task-agnostic pretraining combined with efficient cross-branch fusion can produce robust, transferable representations for clinical trial outcome prediction and enable data-efficient adaptation to new trials.
Abstract
Many existing models for clinical trial outcome prediction are optimized using task-specific loss functions on trial phase-specific data. While this scheme may boost prediction for common diseases and drugs, it can hinder learning of generalizable representations, leading to more false positives/negatives. To address this limitation, we introduce CLaDMoP, a new pre-training approach for clinical trial outcome prediction, alongside the Successful Clinical Trials dataset(SCT), specifically designed for this task. CLaDMoP leverages a Large Language Model-to encode trials' eligibility criteria-linked to a lightweight Drug-Molecule branch through a novel multi-level fusion technique. To efficiently fuse long embeddings across levels, we incorporate a grouping block, drastically reducing computational overhead. CLaDMoP avoids reliance on task-specific objectives by pre-training on a "pair matching" proxy task. Compared to established zero-shot and few-shot baselines, our method significantly improves both PR-AUC and ROC-AUC, especially for phase I and phase II trials. We further evaluate and perform ablation on CLaDMoP after Parameter-Efficient Fine-Tuning, comparing it to state-of-the-art supervised baselines, including MEXA-CTP, on the Trial Outcome Prediction(TOP) benchmark. CLaDMoP achieves up to 10.5% improvement in PR-AUC and 3.6% in ROC-AUC, while attaining comparable F1 score to MEXA-CTP, highlighting its potential for clinical trial outcome prediction. Code and SCT dataset can be downloaded from https://github.com/murai-lab/CLaDMoP.
