Table of Contents
Fetching ...

SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data

Sourya Sengupta, Satrajit Chakrabarty, Keerthi Sravan Ravi, Gopal Avinash, Ravi Soni

TL;DR

SynthFM tackles the critical bottleneck of medical image segmentation by eliminating reliance on real annotated data. It generates a comprehensive synthetic dataset that captures shape, boundary, texture, and contrast variations and trains a decoder from scratch while reusing SAM’s pretrained encoder. Across 11 structures and 9 public datasets spanning CT, MRI, and Ultrasound, SynthFM consistently outperforms zero-shot baselines such as SAM, SAM2, and UnSAM, demonstrating modality-agnostic generalization in a fully synthetic, interactive prompting setting. The approach introduces a foundational data modeling perspective for medical imaging and points to promising directions for 3D extension and broader clinical deployment.

Abstract

Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose SynthFM, a synthetic data generation framework that mimics the complexities of medical images, enabling foundation models to adapt without real medical data. Using SAM's pretrained encoder and training the decoder from scratch on SynthFM's dataset, we evaluated our method on 11 anatomical structures across 9 datasets (CT, MRI, and Ultrasound). SynthFM outperformed zero-shot baselines like SAM and MedSAM, achieving superior results under different prompt settings and on out-of-distribution datasets.

SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data

TL;DR

SynthFM tackles the critical bottleneck of medical image segmentation by eliminating reliance on real annotated data. It generates a comprehensive synthetic dataset that captures shape, boundary, texture, and contrast variations and trains a decoder from scratch while reusing SAM’s pretrained encoder. Across 11 structures and 9 public datasets spanning CT, MRI, and Ultrasound, SynthFM consistently outperforms zero-shot baselines such as SAM, SAM2, and UnSAM, demonstrating modality-agnostic generalization in a fully synthetic, interactive prompting setting. The approach introduces a foundational data modeling perspective for medical imaging and points to promising directions for 3D extension and broader clinical deployment.

Abstract

Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose SynthFM, a synthetic data generation framework that mimics the complexities of medical images, enabling foundation models to adapt without real medical data. Using SAM's pretrained encoder and training the decoder from scratch on SynthFM's dataset, we evaluated our method on 11 anatomical structures across 9 datasets (CT, MRI, and Ultrasound). SynthFM outperformed zero-shot baselines like SAM and MedSAM, achieving superior results under different prompt settings and on out-of-distribution datasets.

Paper Structure

This paper contains 14 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Different stages of data generation of (a) shape-aware and (b) boundary-aware module. (c) Examples of synthetic data generated using shape-aware module (top) and boundary-aware module (bottom).
  • Figure 2: Qualitative results on different structures across CT, MRI, and Ultrasound modalities for (1 +ve, 2 -ve) clicks. +ve and -ve prompts are shown using x and x.