DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion
Hantao Zhang, Yuhe Liu, Jiancheng Yang, Weidong Guo, Xinyuan Wang, Pascal Fua
TL;DR
DiffAtlas addresses core challenges in atlas-based medical image segmentation by jointly modeling images and masks within a learned diffusion-based atlas space, enabling generation of target image–mask pairs without explicit atlas registration. It incorporates a noisy image guidance strategy during inference to align segmentation with the input anatomy, while preserving global anatomical priors through the generative atlas. Across CT and MRI heart datasets MM-WHS and TotalSegmentator, DiffAtlas achieves state-of-the-art performance in same-domain, cross-modality, varying-domain, and zero-shot scenarios, with pronounced gains in limited-data settings. The approach offers robust, scalable segmentation that maintains anatomical consistency and remains effective without extensive domain-specific atlases, making it a practical GenAI-inspired solution for atlas-based segmentation.
Abstract
Accurate medical image segmentation is crucial for precise anatomical delineation. Deep learning models like U-Net have shown great success but depend heavily on large datasets and struggle with domain shifts, complex structures, and limited training samples. Recent studies have explored diffusion models for segmentation by iteratively refining masks. However, these methods still retain the conventional image-to-mask mapping, making them highly sensitive to input data, which hampers stability and generalization. In contrast, we introduce DiffAtlas, a novel generative framework that models both images and masks through diffusion during training, effectively ``GenAI-fying'' atlas-based segmentation. During testing, the model is guided to generate a specific target image-mask pair, from which the corresponding mask is obtained. DiffAtlas retains the robustness of the atlas paradigm while overcoming its scalability and domain-specific limitations. Extensive experiments on CT and MRI across same-domain, cross-modality, varying-domain, and different data-scale settings using the MMWHS and TotalSegmentator datasets demonstrate that our approach outperforms existing methods, particularly in limited-data and zero-shot modality segmentation. Code is available at https://github.com/M3DV/DiffAtlas.
