3D MedDiffusion: A 3D Medical Latent Diffusion Model for Controllable and High-quality Medical Image Generation
Haoshen Wang, Zhentao Liu, Kaicong Sun, Xiaodong Wang, Dinggang Shen, Zhiming Cui
TL;DR
3D MedDiffusion tackles the challenge of high-quality 3D medical image generation by introducing a Patch-Volume Autoencoder for memory-efficient latent compression and BiFlowNet as a dual-flow noise estimator for diffusion in latent space. The framework, augmented with ControlNet for task conditioning, achieves state-of-the-art fidelity while supporting diverse downstream tasks such as sparse-view CT, fast MRI, and data augmentation for segmentation and classification. Extensive experiments across six CT/MRI datasets, ablations, and a radiologist study demonstrate superior generative quality and strong generalization, with practical efficiency considerations. The work enables controllable, high-resolution 3D medical synthesis and practical adaptation to clinical pipelines, while outlining future directions for arbitrary-size generation and conditioning factors.
Abstract
The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating high-quality 3D medical images, and there is currently no universal generative framework for medical imaging. In this paper, we introduce a 3D Medical Latent Diffusion (3D MedDiffusion) model for controllable, high-quality 3D medical image generation. 3D MedDiffusion incorporates a novel, highly efficient Patch-Volume Autoencoder that compresses medical images into latent space through patch-wise encoding and recovers back into image space through volume-wise decoding. Additionally, we design a new noise estimator to capture both local details and global structural information during diffusion denoising process. 3D MedDiffusion can generate fine-detailed, high-resolution images (up to 512x512x512) and effectively adapt to various downstream tasks as it is trained on large-scale datasets covering CT and MRI modalities and different anatomical regions (from head to leg). Experimental results demonstrate that 3D MedDiffusion surpasses state-of-the-art methods in generative quality and exhibits strong generalizability across tasks such as sparse-view CT reconstruction, fast MRI reconstruction, and data augmentation for segmentation and classification. Source code and checkpoints are available at https://github.com/ShanghaiTech-IMPACT/3D-MedDiffusion.
