Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis

Zolnamar Dorjsembe; Hsing-Kuo Pao; Sodtavilan Odonchimed; Furen Xiao

Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis

Zolnamar Dorjsembe, Hsing-Kuo Pao, Sodtavilan Odonchimed, Furen Xiao

TL;DR

The paper tackles data scarcity and privacy in brain MRI by introducing Med-DDPM, a conditional diffusion model that generates high-fidelity 3D brain MRIs guided by segmentation masks through channel-wise conditioning, i.e., ter tilde{x_t} = x_t c. It demonstrates that a pixel-wise L1 loss better preserves detail than L2 in this setting and achieves competitive, if not superior, fidelity and diversity compared with GAN baselines. Med-DDPM improves downstream tumor segmentation when used for data augmentation, attaining Dice scores approaching those with real data (e.g., Dice = 0.6675 on 1k real + 2k synthetic vs 0.6531 real alone) and enabling multimodal synthesis (T1, T1CE, T2, Flair) from masks. While memory demands are higher than those of GAN-based methods and some vascular and mass-effect cues require refinement, the method offers a principled path toward data-efficient, privacy-preserving medical image synthesis with practical augmentation and anonymization potential.

Abstract

Artificial intelligence (AI) in healthcare, especially in medical imaging, faces challenges due to data scarcity and privacy concerns. Addressing these, we introduce Med-DDPM, a diffusion model designed for 3D semantic brain MRI synthesis. This model effectively tackles data scarcity and privacy issues by integrating semantic conditioning. This involves the channel-wise concatenation of a conditioning image to the model input, enabling control in image generation. Med-DDPM demonstrates superior stability and performance compared to existing 3D brain imaging synthesis methods. It generates diverse, anatomically coherent images with high visual fidelity. In terms of dice score accuracy in the tumor segmentation task, Med-DDPM achieves 0.6207, close to the 0.6531 accuracy of real images, and outperforms baseline models. Combined with real images, it further increases segmentation accuracy to 0.6675, showing the potential of our proposed method for data augmentation. This model represents the first use of a diffusion model in 3D semantic brain MRI synthesis, producing high-quality images. Its semantic conditioning feature also shows potential for image anonymization in biomedical imaging, addressing data and privacy issues. We provide the code and model weights for Med-DDPM on our GitHub repository (https://github.com/mobaidoctor/med-ddpm/) to support reproducibility.

Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis

TL;DR

Abstract

Paper Structure (15 sections, 4 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 4 equations, 7 figures, 3 tables, 2 algorithms.

Introduction
Method
Loss Function
Experiments and Results
Datasets and Image Preprocessing
Experiment Details
Evaluation Metric
Generated Images
Quantitative Results
Qualitative results
Comparison of Segmentation Models Trained on Synthetic Images
3D Multimodal MRI Synthesis Experiment
Memory Efficiency
Discussion
Conclusion

Figures (7)

Figure 1: Architecture of the proposed method: The top row of the diagram demonstrates the conditioning mechanism of our approach, featuring the forward diffusion process $q(x_t | x_{t-1})$, and the denoising process $p_\theta (x_{t-1} | \tilde{x}_t)$. This process involves concatenating the conditioning mask $c$ with the input image $x_t$, resulting in the concatenated image $\tilde{x}_t$ utilized in the denoising process $p_\theta$. The bottom row presents an enhanced model architecture, adapted from previouswork, providing a detailed view of the noise predictor U-Net model $\epsilon_\theta$ . This model predicts the noise $\epsilon'$, a critical component for the denoising process, as detailed in \ref{['eq:3']}.
Figure 2: Comparison of overall quality in 3D brain MRI synthesis. This figure presents the quality comparison between real and synthetic 3D brain MRIs across coronal, sagittal, and axial slices. The first row displays a random real MRI sample alongside a synthetic sample from our proposed method. The second row presents random samples from baseline conditional synthesis methods. The final 2 rows showcase random samples from the latest unconditional synthesis methods, specifically designed for 3D brain MRI synthesis.
Figure 3: Zoomed visual comparison of tumor areas in real and generated samples (axial view slices). Med-DDPM and 3D DiscoGAN generate more realistic tumor parts with smoother edges and less artifacts. 3D Pix2Pix, on the other hand, has poor tumor synthesis, with strong artifacts that look unrealistic.
Figure 4: Comparison of synthetic images generated with manipulated masks (axial view slices). 3D DiscoGAN primarily captures the same brain features, with only slight variations in pixel intensities for tumor parts. 3D Pix2Pix also exhibits a similar limitation, highlighting the issue of mode collapse and the lack of diverse image generation in GAN models. In contrast, the proposed method, Med-DDPM, excels in synthesizing diverse images with strong variations.
Figure 5: Center-cut axial slices of generated samples, showcasing the output diversity of Med-DDPM for a single input mask.
...and 2 more figures

Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis

TL;DR

Abstract

Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (7)