Time-to-Event Pretraining for 3D Medical Imaging
Zepeng Huo, Jason Alan Fries, Alejandro Lozano, Jeya Maria Jose Valanarasu, Ethan Steinberg, Louis Blankemeier, Akshay S. Chaudhari, Curtis Langlotz, Nigam H. Shah
TL;DR
This work tackles the missing temporal context in 3D medical imaging pretraining by introducing time-to-event pretraining, which leverages large-scale longitudinal EHR data to generate thousands of prognostic tasks. Using 18,945 chest CTs linked to 225M clinical events, the authors train a 3D encoder with a time-to-event objective (8,192 tasks) and adapt it with a CoxPH head for prognosis and a classifier head for diagnosis. The approach yields substantial gains in prognostic metrics (average AUROC up about 0.13 and Harrell's C-index up about 0.16) and improved calibration, while preserving diagnostic performance on external tasks. The results demonstrate the value of incorporating longitudinal outcome data into 3D imaging pretraining, enabling better clinical risk prediction and paving the way for multi-modal, prognosis-oriented foundation models.
Abstract
With the rise of medical foundation models and the growing availability of imaging data, scalable pretraining techniques offer a promising way to identify imaging biomarkers predictive of future disease risk. While current self-supervised methods for 3D medical imaging models capture local structural features like organ morphology, they fail to link pixel biomarkers with long-term health outcomes due to a missing context problem. Current approaches lack the temporal context necessary to identify biomarkers correlated with disease progression, as they rely on supervision derived only from images and concurrent text descriptions. To address this, we introduce time-to-event pretraining, a pretraining framework for 3D medical imaging models that leverages large-scale temporal supervision from paired, longitudinal electronic health records (EHRs). Using a dataset of 18,945 CT scans (4.2 million 2D images) and time-to-event distributions across thousands of EHR-derived tasks, our method improves outcome prediction, achieving an average AUROC increase of 23.7% and a 29.4% gain in Harrell's C-index across 8 benchmark tasks. Importantly, these gains are achieved without sacrificing diagnostic classification performance. This study lays the foundation for integrating longitudinal EHR and 3D imaging data to advance clinical risk prediction.
