Table of Contents
Fetching ...

Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles

Tim Ruschke, Jonathan Frederik Carlsen, Adam Espe Hansen, Ulrich Lindberg, Amalie Monberg Hindsholm, Martin Norgaard, Claes Nøhr Ladefoged

TL;DR

Evidence is provided that guided synthesis of labeled brain MRI data using LDMs improves the segmentation of enlarged ventricles and outperforms existing state-of-the-art segmentation models.

Abstract

Deep learning models in medical contexts face challenges like data scarcity, inhomogeneity, and privacy concerns. This study focuses on improving ventricular segmentation in brain MRI images using synthetic data. We employed two latent diffusion models (LDMs): a mask generator trained using 10,000 masks, and a corresponding SPADE image generator optimized using 6,881 scans to create an MRI conditioned on a 3D brain mask. Conditioning the mask generator on ventricular volume in combination with classifier-free guidance enabled the control of the ventricular volume distribution of the generated synthetic images. Next, the performance of the synthetic data was tested using three nnU-Net segmentation models trained on a real, augmented and entirely synthetic data, respectively. The resulting models were tested on a completely independent hold-out dataset of patients with enlarged ventricles, with manual delineation of the ventricles used as ground truth. The model trained on real data showed a mean absolute error (MAE) of 9.09 \pm 12.18 mL in predicted ventricular volume, while the models trained on synthetic and augmented data showed MAEs of 7.52 \pm 4.81 mL and 6.23 \pm 4.33 mL, respectively. Both the synthetic and augmented model also outperformed the state-of-the-art model SynthSeg, which due to limited performance in cases of large ventricular volumes, showed an MAE of 7.73 \pm 12.12 mL with a factor of 3 higher standard deviation. The model trained on augmented data showed the highest Dice score of 0.892 \pm 0.05, slightly outperforming SynthSeg and on par with the model trained on real data. The synthetic model performed similar to SynthSeg. In summary, we provide evidence that guided synthesis of labeled brain MRI data using LDMs improves the segmentation of enlarged ventricles and outperforms existing state-of-the-art segmentation models.

Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles

TL;DR

Evidence is provided that guided synthesis of labeled brain MRI data using LDMs improves the segmentation of enlarged ventricles and outperforms existing state-of-the-art segmentation models.

Abstract

Deep learning models in medical contexts face challenges like data scarcity, inhomogeneity, and privacy concerns. This study focuses on improving ventricular segmentation in brain MRI images using synthetic data. We employed two latent diffusion models (LDMs): a mask generator trained using 10,000 masks, and a corresponding SPADE image generator optimized using 6,881 scans to create an MRI conditioned on a 3D brain mask. Conditioning the mask generator on ventricular volume in combination with classifier-free guidance enabled the control of the ventricular volume distribution of the generated synthetic images. Next, the performance of the synthetic data was tested using three nnU-Net segmentation models trained on a real, augmented and entirely synthetic data, respectively. The resulting models were tested on a completely independent hold-out dataset of patients with enlarged ventricles, with manual delineation of the ventricles used as ground truth. The model trained on real data showed a mean absolute error (MAE) of 9.09 \pm 12.18 mL in predicted ventricular volume, while the models trained on synthetic and augmented data showed MAEs of 7.52 \pm 4.81 mL and 6.23 \pm 4.33 mL, respectively. Both the synthetic and augmented model also outperformed the state-of-the-art model SynthSeg, which due to limited performance in cases of large ventricular volumes, showed an MAE of 7.73 \pm 12.12 mL with a factor of 3 higher standard deviation. The model trained on augmented data showed the highest Dice score of 0.892 \pm 0.05, slightly outperforming SynthSeg and on par with the model trained on real data. The synthetic model performed similar to SynthSeg. In summary, we provide evidence that guided synthesis of labeled brain MRI data using LDMs improves the segmentation of enlarged ventricles and outperforms existing state-of-the-art segmentation models.

Paper Structure

This paper contains 16 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Histogram of the distribution of normalized ventricular volumes for the LDM100k data (N=10,000) and the MS-FLAIR data (N=6,881). The x-axis displays the normalized volume ratio $c$, which is estimated as the ratio of ventricular volume to intracranial volume to get the relative size of the ventricles, and subsequently normalized over the data to the range between 0 and 1.
  • Figure 2: Overview of the pipeline used to generate synthetic labeled data. Latent diffusion models (DM) are employed in two stages to produce synthetic brain images and corresponding masks. The pipeline begins with a noise input passed through a Mask DM that can be conditioned on the parameter $c$ reflecting the normalized ventricular volume ratio, followed by denoising to produce a latent representation. A Mask Decoder transforms this latent representation into a synthetic segmentation mask. Parallel to this, a SPADE DM, conditioned on the synthetic mask, generates denoised latent representations which are decoded to create a synthetic brain image.
  • Figure 3: Overview of the effect of conditioning and guidance on ventricular volume generation. Specifically, this figure displays the mean ventricular volume (in mL) as a function of the normalized ventricular volume ratio $c$, comparing the ground truth ventricular volumes (black circles) with synthetic data generated using different guidance factors $G \in \{1.0, 2.0, 3.0, 4.0, 5.0\}$. The ground truth values are derived from the validation set using SynthSeg-generated masks. Synthetic masks were generated with similar $c$ values, and the mean ventricular volumes were compared. We also prompt the model with $c > 1.0$ (values not encountered during training) to investigate how well the model can extrapolate to larger ventricular sizes.
  • Figure 4: Example case of synthetic brain images/masks generated using a guidance factor $G=4.0$ and varying conditioning parameters $c$. On the left, the synthetic segmentation mask is displayed, followed by the corresponding synthetic brain image. The right-most three images show variations in ventricular size for increasing $c$ values, with the ventricular label outline highlighted in green. As $c$ increases, the ventricular size expands accordingly, reflecting the expected changes based on the conditioning parameter. Degeneration of the synthetic data is observed for increasing $c$ values (right).
  • Figure 5: (A) Violin plots of ventricular volume for each of the three training datasets using real, synthetic, and augmented (synthetic+real) (B) volume of predicted ventricle mask for a subset of the NPH-FLAIR$_{test}$ (N=42) with ventricular volumes above 150 mL, sorted by ground truth volume, (C) percent difference from ground truth and the predicted ventricular volume across the 4 models for the same subset of data as in (B).
  • ...and 3 more figures