Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement
Yuhan Wei, Yuting He, Linshan Wu, Fuxiang Huang, Junlin Hou, Hao Chen
TL;DR
RaSD proposes a paradigm-shifting approach to medical image foundation model pre-training by fully synthetic, on-the-fly data generation using randomized Gaussian structures and appearance variations. Through prototype disentangling learning, RaSD encourages region-wise semantic decoupling and cohesive regional features, enabling robust transfer across 6 modalities, 48 datasets, and 56 downstream tasks. Across 3D CT/MRI, 2D X-ray, ultrasound, fundus, and pathology domains, RaSD matches or surpasses real-data pre-trained baselines on many tasks, while offering zero data storage, privacy preservation, and scalable online training. The results suggest synthetic data alone can support scalable, generalizable medical AI foundations, with broader implications for privacy-conscious clinical deployment and rapid model expansion.
Abstract
Medical image foundation models (MIFMs) have demonstrated remarkable potential for a wide range of clinical tasks, yet their development is constrained by the scarcity, heterogeneity, and high cost of large-scale annotated datasets. Here, we propose RaSD (Randomized Synthesis and Disentanglement), a scalable framework for pre-training MIFMs entirely on synthetic data. By modeling anatomical structures and appearance variations with randomized Gaussian distributions, RaSD exposes models to sufficient multi-scale structural and appearance perturbations, forcing them to rely on invariant and task-relevant anatomical cues rather than dataset-specific textures, thereby enabling robust and transferable representation learning. We pre-trained RaSD on 1.2 million 3D volumes and 9.6 million 2D images, and extensively evaluated the resulting models across 6 imaging modalities, 48 datasets, and 56 downstream tasks. Across all evaluated downstream tasks, RaSD consistently outperforms training-from-scratch models, achieves the best performance on 17 tasks, and remains comparable to models pre-trained on large real datasets in most others. These results demonstrate that the capacity of synthetic data alone to drive robust representation learning. Our findings establish a paradigm shift in medical AI, demonstrating that synthetic data can serve as a "free lunch" for scalable, privacy-preserving, and clinically generalizable foundation models.
