Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models
Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan
TL;DR
The paper addresses the gap between AI development and clinical practice by highlighting the inadequacy of traditional evaluations and proposing DC-AI RCTs and VC-MedAI as rigorous, cost-effective alternatives. They implement a two-step DC-AI RCT across 14 centers with 125 clinicians and $7500$ diagnosis records, and develop VC-MedAI as a preclinical-like in-silico trial framework that mirrors prospective trials. Results show that DC-AI RCTs reveal substantial interactions between clinicians and AI, with invisible random models improving $AUC$ by $3.37$ percentage points and AI models giving $AUC$ gains from $1.95$ to $10.9$ percentage points; VC-MedAI specialized simulator achieves $AUC$ around $0.81$–$0.82$, while generalized simulator reaches about $0.85$, and VC-MedAI provides roughly $150$-fold speedups in evaluating new AI tools. The study argues these methods can accelerate safe, iterative AI integration into practice and guide future AI development with clinician collaboration and demographic representativeness.
Abstract
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-step inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.
