Table of Contents
Fetching ...

Brain tumor segmentation using synthetic MR images -- A comparison of GANs and diffusion models

Muhammad Usman Akbar, Måns Larsson, Anders Eklund

TL;DR

This study addresses privacy-driven data-sharing barriers in medical imaging by evaluating whether synthetic brain MRI data can effectively train segmentation models. It compares four 2D GANs (Progressive GAN, StyleGAN1-3) and a diffusion model to generate 5-channel brain MRI data (four MR sequences plus tumor annotation) and assesses performance using U-Net and Swin-transformer segmentation networks on BraTS 2020/2021. The findings show that segmentation models trained on synthetic data achieve Dice scores at 80-90% of those trained on real data, with diffusion models more prone to memorization on small datasets, while StyleGANs offer competitive performance; overall, synthetic data sharing is viable but requires careful handling of memorization and model choice. The work provides public access to the generated synthetic images and trained models and highlights the need for improved evaluation metrics beyond FID/IS to accurately reflect clinical segmentation performance.

Abstract

Large annotated datasets are required for training deep learning models, but in medical imaging data sharing is often complicated due to ethics, anonymization and data protection legislation. Generative AI models, such as generative adversarial networks (GANs) and diffusion models, can today produce very realistic synthetic images, and can potentially facilitate data sharing. However, in order to share synthetic medical images it must first be demonstrated that they can be used for training different networks with acceptable performance. Here, we therefore comprehensively evaluate four GANs (progressive GAN, StyleGAN 1-3) and a diffusion model for the task of brain tumor segmentation (using two segmentation networks, U-Net and a Swin transformer). Our results show that segmentation networks trained on synthetic images reach Dice scores that are 80% - 90% of Dice scores when training with real images, but that memorization of the training images can be a problem for diffusion models if the original dataset is too small. Our conclusion is that sharing synthetic medical images is a viable option to sharing real images, but that further work is required. The trained generative models and the generated synthetic images are shared on AIDA data hub

Brain tumor segmentation using synthetic MR images -- A comparison of GANs and diffusion models

TL;DR

This study addresses privacy-driven data-sharing barriers in medical imaging by evaluating whether synthetic brain MRI data can effectively train segmentation models. It compares four 2D GANs (Progressive GAN, StyleGAN1-3) and a diffusion model to generate 5-channel brain MRI data (four MR sequences plus tumor annotation) and assesses performance using U-Net and Swin-transformer segmentation networks on BraTS 2020/2021. The findings show that segmentation models trained on synthetic data achieve Dice scores at 80-90% of those trained on real data, with diffusion models more prone to memorization on small datasets, while StyleGANs offer competitive performance; overall, synthetic data sharing is viable but requires careful handling of memorization and model choice. The work provides public access to the generated synthetic images and trained models and highlights the need for improved evaluation metrics beyond FID/IS to accurately reflect clinical segmentation performance.

Abstract

Large annotated datasets are required for training deep learning models, but in medical imaging data sharing is often complicated due to ethics, anonymization and data protection legislation. Generative AI models, such as generative adversarial networks (GANs) and diffusion models, can today produce very realistic synthetic images, and can potentially facilitate data sharing. However, in order to share synthetic medical images it must first be demonstrated that they can be used for training different networks with acceptable performance. Here, we therefore comprehensively evaluate four GANs (progressive GAN, StyleGAN 1-3) and a diffusion model for the task of brain tumor segmentation (using two segmentation networks, U-Net and a Swin transformer). Our results show that segmentation networks trained on synthetic images reach Dice scores that are 80% - 90% of Dice scores when training with real images, but that memorization of the training images can be a problem for diffusion models if the original dataset is too small. Our conclusion is that sharing synthetic medical images is a viable option to sharing real images, but that further work is required. The trained generative models and the generated synthetic images are shared on AIDA data hub
Paper Structure (21 sections, 5 figures, 9 tables)

This paper contains 21 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Synthetic 5-channel images from the BraTS 2021 data. Each row shows a generative model, except for the top row which shows a real example, and each column shows a different MR sequence.
  • Figure 2: Graph depicting the U-Net segmentation performance (Dice score) when using different proportions of real (BraTS 2021) and synthetic images generated from StyleGAN 3 (trained on BraTS 2021), in a constant total set of 100,000 images. As the number of real images increases along the x-axis, fewer synthetic images are used. To avoid random fluctuations, each segmentation model was trained 10 times and the average performance is presented.
  • Figure 3: Left: a real 4-channel image shown during the qualitative evaluation, where the task was to classify each example as real or synthetic. Right: a synthetic 4-channel image shown during the qualitative evaluation.
  • Figure 4: Example U-Net predictions on an image in the BraTS 2020 test set. Classes are visualized as colored overlay where red is GD-enhancing tumor, blue is peritumoral edema (ED) and green is necrotic and non-enhancing tumor core (NCR/NET). Each prediction is shown for four trainings using images from each generative model; with and without augmentation and with and without the original data. The two bottom rows present predictions from when training using synthetic images.
  • Figure 5: Example U-Net predictions on an image in the BraTS 2021 test set. Classes are visualized as colored overlay where red is GD-enhancing tumor, blue is peritumoral edema (ED) and green is necrotic and non-enhancing tumor core (NCR/NET). Each prediction is shown for four trainings using images from each generative model; with and without augmentation and with and without the original data. The two bottom rows present predictions from when training using synthetic images.