TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

Yue Wang; Tianfan Fu; Yinlong Xu; Zihan Ma; Hongxia Xu; Yingzhou Lu; Bang Du; Honghao Gao; Jian Wu

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

Yue Wang, Tianfan Fu, Yinlong Xu, Zihan Ma, Hongxia Xu, Yingzhou Lu, Bang Du, Honghao Gao, Jian Wu

TL;DR

TWIN-GPT uses a ChatGPT-based large language model fine-tuned on clinical-trial data to create personalized digital twins for virtual clinical trials. By leveraging prompt tuning and k-nearest neighbor context, it generates patient-specific trajectories that improve outcome prediction and enable counterfactual analyses while addressing data gaps and inconsistencies in EHRs. The approach demonstrates high fidelity in distributing event patterns, maintains predictive performance comparable to real data for severe outcomes and adverse events, and exhibits favorable privacy metrics across presence and attribute disclosure as well as NNAA risk. This work has practical implications for accelerating trial design, reducing participant burden, and enhancing patient safety through in silico evaluation grounded in rich clinical knowledge embedded in LLMs.

Abstract

Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significantly enhance patient safety, expedite development, reduce costs, and contribute to the broader scientific knowledge in healthcare. Existing research often focuses on leveraging electronic health records (EHRs) to support clinical trial outcome prediction. Yet, trained with limited clinical trial outcome data, existing approaches frequently struggle to perform accurate predictions. Some research has attempted to generate EHRs to augment model development but has fallen short in personalizing the generation for individual patient profiles. Recently, the emergence of large language models has illuminated new possibilities, as their embedded comprehensive clinical knowledge has proven beneficial in addressing medical issues. In this paper, we propose a large language model-based digital twin creation approach, called TWIN-GPT. TWIN-GPT can establish cross-dataset associations of medical information given limited data, generating unique personalized digital twins for different patients, thereby preserving individual patient characteristics. Comprehensive experiments show that using digital twins created by TWIN-GPT can boost the clinical trial outcome prediction, exceeding various previous prediction approaches.

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

TL;DR

Abstract

Paper Structure (40 sections, 11 equations, 12 figures, 2 tables)

This paper contains 40 sections, 11 equations, 12 figures, 2 tables.

Introduction
RELATED WORK
Large Language Models (LLMs) for Medicine
Patient Outcome Prediction
EHR Data Generation
METHODOLOGY
Problem Formulation and Overview Pipeline
Prompt Tuning for Digital Twin in LLMs
Digital Twin Generation
APPLICATION & EVALUATION
Digital Twins Generation Application
Personalized Generation
Counterfactual Generation
Clinical Trial Outcome Prediction Evaluation of Digital Twins
Dimension-wise probability.
...and 25 more sections

Figures (12)

Figure 1: The workflow of TWIN-GPT. (Bottom) TWIN-GPT takes real follow-up visits $X_{n,1:T_n-1}$ of a patient and generates twin visits of next step, $\hat{X}_{n,T_n}$. Finally, the whole visit sequence can be predicted. (Top) The top part elaborates on how to use the digital twin visits $\hat{x}_{n,1:t}$ (at the time step $t$) to predict the events that occurred in the next timestamp $\hat{x}^{\text{event}}_{n,t+1}$. We use K nearest neighboring patient visits in TWIN-GPT fine-tuning but only use origin visits in prediction. "Treat": treatment; "Med": medication; "AE": adverse event.
Figure 2: On the Original clinical trial dataset, we analyzed the dimension-wise Pearson correlation coefficient (r) of adverse events to evaluate the performance of TWIN-GPT. The x-axis displays the probability across dimensions for real data, while the y-axis signifies the probability associated with synthetic data.
Figure 3: On the TOP dataset, we analyzed the dimension-wise Pearson correlation coefficient (r) of adverse events to evaluate the performance of TWIN-GPT. The x-axis displays the probability across dimensions for real data, while the y-axis is the probability associated with synthetic data.
Figure 4: On the TOP dataset, we analyzed the dimension-wise Pearson correlation coefficient (r) of adverse events in the three phases to evaluate the performance of TWIN-GPT. The x-axis displays the probability across dimensions for real data, while the y-axis signifies the probability associated with synthetic data.
Figure 5: Patient-wise Pearson correlation coefficient (r) for TWIN-GPT. r is charting the distributional properties (DPs) of each patient's closest match on the x-axis against the DPs of their respective synthetic digital twin on the y-axis. Most of the participants have high fidelity (r larger than 0.8).
...and 7 more figures

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

TL;DR

Abstract

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (12)