Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models
Nicholas Konz, Yuwen Chen, Haoyu Dong, Maciej A. Mazurowski
TL;DR
This work tackles the challenge of enforcing precise anatomical constraints in medical image generation. It introduces SegGuidedDiff, a segmentation-guided diffusion model that conditions on multi-class anatomical masks at every denoising step and employs a mask-ablated training strategy to handle partial masks. Across breast MRI and neck-to-pelvis CT datasets, SegGuidedDiff achieves state-of-the-art fidelity to input masks and competitive anatomical realism, with additional capability to tune anatomical similarity to real images via latent-space interpolation. The method enables applications such as anatomically paired data, cross-modality translation, and counterfactual data generation, offering a practical tool for medical image synthesis with controllable anatomy.
Abstract
Diffusion models have enabled remarkably high-quality medical image generation, yet it is challenging to enforce anatomical constraints in generated images. To this end, we propose a diffusion model-based method that supports anatomically-controllable medical image generation, by following a multi-class anatomical segmentation mask at each sampling step. We additionally introduce a random mask ablation training algorithm to enable conditioning on a selected combination of anatomical constraints while allowing flexibility in other anatomical areas. We compare our method ("SegGuidedDiff") to existing methods on breast MRI and abdominal/neck-to-pelvis CT datasets with a wide range of anatomical objects. Results show that our method reaches a new state-of-the-art in the faithfulness of generated images to input anatomical masks on both datasets, and is on par for general anatomical realism. Finally, our model also enjoys the extra benefit of being able to adjust the anatomical similarity of generated images to real images of choice through interpolation in its latent space. SegGuidedDiff has many applications, including cross-modality translation, and the generation of paired or counterfactual data. Our code is available at https://github.com/mazurowski-lab/segmentation-guided-diffusion.
