Table of Contents
Fetching ...

DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion

Hantao Zhang, Yuhe Liu, Jiancheng Yang, Weidong Guo, Xinyuan Wang, Pascal Fua

TL;DR

DiffAtlas addresses core challenges in atlas-based medical image segmentation by jointly modeling images and masks within a learned diffusion-based atlas space, enabling generation of target image–mask pairs without explicit atlas registration. It incorporates a noisy image guidance strategy during inference to align segmentation with the input anatomy, while preserving global anatomical priors through the generative atlas. Across CT and MRI heart datasets MM-WHS and TotalSegmentator, DiffAtlas achieves state-of-the-art performance in same-domain, cross-modality, varying-domain, and zero-shot scenarios, with pronounced gains in limited-data settings. The approach offers robust, scalable segmentation that maintains anatomical consistency and remains effective without extensive domain-specific atlases, making it a practical GenAI-inspired solution for atlas-based segmentation.

Abstract

Accurate medical image segmentation is crucial for precise anatomical delineation. Deep learning models like U-Net have shown great success but depend heavily on large datasets and struggle with domain shifts, complex structures, and limited training samples. Recent studies have explored diffusion models for segmentation by iteratively refining masks. However, these methods still retain the conventional image-to-mask mapping, making them highly sensitive to input data, which hampers stability and generalization. In contrast, we introduce DiffAtlas, a novel generative framework that models both images and masks through diffusion during training, effectively ``GenAI-fying'' atlas-based segmentation. During testing, the model is guided to generate a specific target image-mask pair, from which the corresponding mask is obtained. DiffAtlas retains the robustness of the atlas paradigm while overcoming its scalability and domain-specific limitations. Extensive experiments on CT and MRI across same-domain, cross-modality, varying-domain, and different data-scale settings using the MMWHS and TotalSegmentator datasets demonstrate that our approach outperforms existing methods, particularly in limited-data and zero-shot modality segmentation. Code is available at https://github.com/M3DV/DiffAtlas.

DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion

TL;DR

DiffAtlas addresses core challenges in atlas-based medical image segmentation by jointly modeling images and masks within a learned diffusion-based atlas space, enabling generation of target image–mask pairs without explicit atlas registration. It incorporates a noisy image guidance strategy during inference to align segmentation with the input anatomy, while preserving global anatomical priors through the generative atlas. Across CT and MRI heart datasets MM-WHS and TotalSegmentator, DiffAtlas achieves state-of-the-art performance in same-domain, cross-modality, varying-domain, and zero-shot scenarios, with pronounced gains in limited-data settings. The approach offers robust, scalable segmentation that maintains anatomical consistency and remains effective without extensive domain-specific atlases, making it a practical GenAI-inspired solution for atlas-based segmentation.

Abstract

Accurate medical image segmentation is crucial for precise anatomical delineation. Deep learning models like U-Net have shown great success but depend heavily on large datasets and struggle with domain shifts, complex structures, and limited training samples. Recent studies have explored diffusion models for segmentation by iteratively refining masks. However, these methods still retain the conventional image-to-mask mapping, making them highly sensitive to input data, which hampers stability and generalization. In contrast, we introduce DiffAtlas, a novel generative framework that models both images and masks through diffusion during training, effectively ``GenAI-fying'' atlas-based segmentation. During testing, the model is guided to generate a specific target image-mask pair, from which the corresponding mask is obtained. DiffAtlas retains the robustness of the atlas paradigm while overcoming its scalability and domain-specific limitations. Extensive experiments on CT and MRI across same-domain, cross-modality, varying-domain, and different data-scale settings using the MMWHS and TotalSegmentator datasets demonstrate that our approach outperforms existing methods, particularly in limited-data and zero-shot modality segmentation. Code is available at https://github.com/M3DV/DiffAtlas.

Paper Structure

This paper contains 22 sections, 6 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Illustration of different segmentation paradigms. (a) Directly maps the input image to the segmentation mask; (b) Uses registration to align a labeled atlas with the target image and propagate atlas labels; (c) Conditioned on the input image, achieves image-to-segmentation mask mapping using diffusion; (d) Train: Parameterizes an atlas and simultaneously generates image-mask pairs. Test: Uses the noisy image as guidance to generate the corresponding atlas.
  • Figure 2: Visualization of 2-shot cross-domain and few-shot results.