Towards Virtual Clinical Trials of Radiology AI with Conditional Generative Modeling
Benjamin D. Killeen, Bohua Wan, Aditya V. Kulkarni, Nathan Drenkow, Michael Oberst, Paul H. Yi, Mathias Unberath
TL;DR
This work introduces virtual clinical trials (VCTs) for radiology AI by developing a conditional, full-body CT synthesis model based on a latent diffusion framework. The system jointly models $p(\mathbf{X},\mathbf{\Y})$ and $p(\mathbf{Z}_{\rm img},\mathbf{Z}_{\rm seg}|\mathbf{a})$ to generate anatomically consistent CT images and segmentations conditioned on demographic attributes $\mathbf{a}$. Through comprehensive evaluation (FID, Dice, organ-volume/centroid correlations, and conditioning fidelity), the authors demonstrate high realism and anatomical plausibility, enabling scalable VCTs for bias auditing and robustness assessment. Applying VCTs to body-fat and muscle-mass estimation tasks, they show synthetic cohorts recapitulate real-world degradation and biases, outperform conventional weighting in detecting OOD degradation, and reveal the attributes most predictive of errors. The results suggest VCTs can streamline proactive AI validation, help mitigate biases, and support safer deployment of radiology AI, with future work expanding conditioning and emergent properties at scale.
Abstract
Artificial intelligence (AI) is poised to transform healthcare by enabling personalized and efficient care through data-driven insights. Although radiology is at the forefront of AI adoption, in practice, the potential of AI models is often overshadowed by severe failures to generalize: AI models can have performance degradation of up to 20% when transitioning from controlled test environments to clinical use by radiologists. This mismatch raises concerns that radiologists will be misled by incorrect AI predictions in practice and/or grow to distrust AI, rendering these promising technologies practically ineffectual. Exhaustive clinical trials of AI models on abundant and diverse data is thus critical to anticipate AI model degradation when encountering varied data samples. Achieving these goals, however, is challenging due to the high costs of collecting diverse data samples and corresponding annotations. To overcome these limitations, we introduce a novel conditional generative AI model designed for virtual clinical trials (VCTs) of radiology AI, capable of realistically synthesizing full-body CT images of patients with specified attributes. By learning the joint distribution of images and anatomical structures, our model enables precise replication of real-world patient populations with unprecedented detail at this scale. We demonstrate meaningful evaluation of radiology AI models through VCTs powered by our synthetic CT study populations, revealing model degradation and facilitating algorithmic auditing for bias-inducing data attributes. Our generative AI approach to VCTs is a promising avenue towards a scalable solution to assess model robustness, mitigate biases, and safeguard patient care by enabling simpler testing and evaluation of AI models in any desired range of diverse patient populations.
