CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

Michael Reinisch; Jianfeng He; Chenxi Liao; Sauleh Ahmad Siddiqui; Bei Xiao

CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

Michael Reinisch, Jianfeng He, Chenxi Liao, Sauleh Ahmad Siddiqui, Bei Xiao

TL;DR

This work tackles the challenge of forecasting clinical trial phase transitions from protocol text, addressing high attrition and costly development. It introduces CTP-LLM, a GPT-3.5 Turbo–based model, and the PhaseTransition Dataset to automatically predict whether a trial advances to the next phase using only protocol documents. The results show that CTP-LLM achieves substantial predictive power, with an overall accuracy of $67\%$ across all phases and $75\%$ for Phase III→Approval, outperforming baselines like BERT+RF and Longformer. The study demonstrates the feasibility and value of LLM-powered CTOP, highlights critical trial-design features (e.g., Criteria, Brief), and provides a public dataset and benchmark to spur further research in protocol-driven outcome forecasting.

Abstract

New medical treatment development requires multiple phases of clinical trials. Despite the significant human and financial costs of bringing a drug to market, less than 20% of drugs in testing will make it from the first phase to final approval. Recent literature indicates that the design of the trial protocols significantly contributes to trial performance. We investigated Clinical Trial Outcome Prediction (CTOP) using trial design documents to predict phase transitions automatically. We propose CTP-LLM, the first Large Language Model (LLM) based model for CTOP. We also introduce the PhaseTransition (PT) Dataset; which labels trials based on their progression through the regulatory process and serves as a benchmark for CTOP evaluation. Our fine-tuned GPT-3.5-based model (CTP-LLM) predicts clinical trial phase transition by analyzing the trial's original protocol texts without requiring human-selected features. CTP-LLM achieves a 67% accuracy rate in predicting trial phase transitions across all phases and a 75% accuracy rate specifically in predicting the transition from Phase~III to final approval. Our experimental performance highlights the potential of LLM-powered applications in forecasting clinical trial outcomes and assessing trial design.

CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

TL;DR

across all phases and

for Phase III→Approval, outperforming baselines like BERT+RF and Longformer. The study demonstrates the feasibility and value of LLM-powered CTOP, highlights critical trial-design features (e.g., Criteria, Brief), and provides a public dataset and benchmark to spur further research in protocol-driven outcome forecasting.

Abstract

Paper Structure (29 sections, 10 equations, 6 figures, 7 tables)

This paper contains 29 sections, 10 equations, 6 figures, 7 tables.

Introduction
Related Work
Method
Problem Setup
Model Overview
BERT+RF
CTP-LLM
PhaseTransition Dataset Construction
Data Synthesis
Model Training
BERT+RF
CTP-LLM
Experimental Results
Statistics
Data Split
...and 14 more sections

Figures (6)

Figure 1: Overview of our framework, CTP-LLM, for clinical trial phase transition prediction. A trial protocol is a comprehensive document that outlines the plan for conducting the trial (see Appendix \ref{['sec:app_protocol']}). A treatment is typically tested in three phases, starting with safety evaluation and dosage in Phase I with a small group of people, then assessing efficacy in Phase II with a larger group, and finally confirming efficacy and safety in Phase III with a large population before FDA approval. However, the treatment can drop out in any phase. Our model takes a protocol of a given phase as input and predicts whether it can successfully transition to the next phase before the trial starts.
Figure 2: Overview of the two models. On the left is the BERT+RF approach, where the trial textual description $x_D$ is divided into its entries, individually embedded by the clinical BERT, concatenated, and then inputted into the RF classifier. On the right are the two steps of the CTP-LLM approach. First, the instruction fine-tuning of the base model $f$, using trial description $x_D$, the prompt $h_C$, and the labels $y$ as inputs to the fine-tuning function $\Phi$, resulting in the fine-tuned model, CTP-LLM ($f$). For an example of the prompt, refer to Table \ref{['tab:prompt']} in Appendix. CTP-LLM only requires the prompt and a trial description to generate a prediction.
Figure 3: Overview of the labelling process as described in Section \ref{['sec:labelling']}.
Figure 4: Distribution of Passed, Failed, and Unknown Outcome Trials in our dataset. Phase II has the highest attrition rate while also having the most entered trials. Our reported outcome distribution differs from the classical literature dimasi2010trendshay2014clinicalkola2004can as we use a novel labeling method for trial success. Still, it is in accordance with previous work using a similar approach as us feijoo2020key.
Figure 5: Overview of Drug Classes and Their Impact on Clinical Trial Outcomes
...and 1 more figures

CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

TL;DR

Abstract

CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)