SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data
Sourya Sengupta, Satrajit Chakrabarty, Keerthi Sravan Ravi, Gopal Avinash, Ravi Soni
TL;DR
SynthFM tackles the critical bottleneck of medical image segmentation by eliminating reliance on real annotated data. It generates a comprehensive synthetic dataset that captures shape, boundary, texture, and contrast variations and trains a decoder from scratch while reusing SAM’s pretrained encoder. Across 11 structures and 9 public datasets spanning CT, MRI, and Ultrasound, SynthFM consistently outperforms zero-shot baselines such as SAM, SAM2, and UnSAM, demonstrating modality-agnostic generalization in a fully synthetic, interactive prompting setting. The approach introduces a foundational data modeling perspective for medical imaging and points to promising directions for 3D extension and broader clinical deployment.
Abstract
Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose SynthFM, a synthetic data generation framework that mimics the complexities of medical images, enabling foundation models to adapt without real medical data. Using SAM's pretrained encoder and training the decoder from scratch on SynthFM's dataset, we evaluated our method on 11 anatomical structures across 9 datasets (CT, MRI, and Ultrasound). SynthFM outperformed zero-shot baselines like SAM and MedSAM, achieving superior results under different prompt settings and on out-of-distribution datasets.
