Generative Medical Segmentation
Jiayu Huo, Xi Ouyang, Sébastien Ourselin, Rachel Sparks
TL;DR
Generative Medical Segmentation (GMS) tackles generalization gaps in medical image segmentation by using a frozen pre-trained vision foundation model to encode images and masks into latent spaces and a lightweight latent mapping model to translate image latents to mask latents, which are then decoded back to pixel space. This approach reduces trainable parameters and enhances cross-domain performance across five public datasets spanning ultrasound, histology, dermoscopy, and endoscopy. Experimental results show GMS outperforms discriminative and several generative baselines and exhibits strong domain generalization, particularly in cross-center ultrasound data, with ablation studies highlighting the importance of both latent- and image-space supervision and the SD-VAE tokenizer. The work suggests that latent-space generative segmentation, driven by foundation-model representations, provides a scalable and effective direction for medical image segmentation, with future plans to extend to 3D data.
Abstract
Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models follow the discriminative pixel-wise classification learning paradigm and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medical Segmentation (GMS), a novel approach leveraging a generative model to perform image segmentation. Concretely, GMS employs a robust pre-trained vision foundation model to extract latent representations for images and corresponding ground truth masks, followed by a model that learns a mapping function from the image to the mask in the latent space. Once trained, the model generates an estimated segmentation mask using the pre-trained vision foundation model to decode the predicted latent representation back into the image space. The design of GMS leads to fewer trainable parameters in the model which reduces the risk of overfitting and enhances its generalization capability. Our experimental analysis across five public datasets in different medical imaging domains demonstrates GMS outperforms existing discriminative and generative segmentation models. Furthermore, GMS is able to generalize well across datasets from different centers within the same imaging modality. Our experiments suggest GMS offers a scalable and effective solution for medical image segmentation. GMS implementation and trained model weights are available at https://github.com/King-HAW/GMS.
