TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model
Yue Wang, Tianfan Fu, Yinlong Xu, Zihan Ma, Hongxia Xu, Yingzhou Lu, Bang Du, Honghao Gao, Jian Wu
TL;DR
TWIN-GPT uses a ChatGPT-based large language model fine-tuned on clinical-trial data to create personalized digital twins for virtual clinical trials. By leveraging prompt tuning and k-nearest neighbor context, it generates patient-specific trajectories that improve outcome prediction and enable counterfactual analyses while addressing data gaps and inconsistencies in EHRs. The approach demonstrates high fidelity in distributing event patterns, maintains predictive performance comparable to real data for severe outcomes and adverse events, and exhibits favorable privacy metrics across presence and attribute disclosure as well as NNAA risk. This work has practical implications for accelerating trial design, reducing participant burden, and enhancing patient safety through in silico evaluation grounded in rich clinical knowledge embedded in LLMs.
Abstract
Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significantly enhance patient safety, expedite development, reduce costs, and contribute to the broader scientific knowledge in healthcare. Existing research often focuses on leveraging electronic health records (EHRs) to support clinical trial outcome prediction. Yet, trained with limited clinical trial outcome data, existing approaches frequently struggle to perform accurate predictions. Some research has attempted to generate EHRs to augment model development but has fallen short in personalizing the generation for individual patient profiles. Recently, the emergence of large language models has illuminated new possibilities, as their embedded comprehensive clinical knowledge has proven beneficial in addressing medical issues. In this paper, we propose a large language model-based digital twin creation approach, called TWIN-GPT. TWIN-GPT can establish cross-dataset associations of medical information given limited data, generating unique personalized digital twins for different patients, thereby preserving individual patient characteristics. Comprehensive experiments show that using digital twins created by TWIN-GPT can boost the clinical trial outcome prediction, exceeding various previous prediction approaches.
