Table of Contents
Fetching ...

Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images

Muhammad Usman Akbar, Wuhao Wang, Anders Eklund

TL;DR

The paper addresses the risk of memorization in diffusion models when synthesizing medical images and compares them with StyleGAN on BRATS brain MRI and chest X-ray pneumonia data. By measuring pixel-wise correlations between synthetic outputs and all training images, the study finds diffusion models memorize training data more readily than GANs, especially with small datasets and 2D slices from 3D volumes. This memorization has privacy implications for sharing synthetic medical data and highlights the inadequacy of FID/IS alone to assess risk. The authors discuss data-size effects, model-size effects, and potential strategies to detect and mitigate memorization in medical applications.

Abstract

Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and a diffusion model, using BRATS20, BRATS21 and a chest x-ray pneumonia dataset, to synthesize brain MRI and chest x-ray images, and measure the correlation between the synthetic images and all training images. Our results show that diffusion models are more likely to memorize the training images, compared to StyleGAN, especially for small datasets and when using 2D slices from 3D volumes. Researchers should be careful when using diffusion models (and to some extent GANs) for medical imaging, if the final goal is to share the synthetic images.

Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images

TL;DR

The paper addresses the risk of memorization in diffusion models when synthesizing medical images and compares them with StyleGAN on BRATS brain MRI and chest X-ray pneumonia data. By measuring pixel-wise correlations between synthetic outputs and all training images, the study finds diffusion models memorize training data more readily than GANs, especially with small datasets and 2D slices from 3D volumes. This memorization has privacy implications for sharing synthetic medical data and highlights the inadequacy of FID/IS alone to assess risk. The authors discuss data-size effects, model-size effects, and potential strategies to detect and mitigate memorization in medical applications.

Abstract

Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and a diffusion model, using BRATS20, BRATS21 and a chest x-ray pneumonia dataset, to synthesize brain MRI and chest x-ray images, and measure the correlation between the synthetic images and all training images. Our results show that diffusion models are more likely to memorize the training images, compared to StyleGAN, especially for small datasets and when using 2D slices from 3D volumes. Researchers should be careful when using diffusion models (and to some extent GANs) for medical imaging, if the final goal is to share the synthetic images.
Paper Structure (22 sections, 6 figures, 3 tables)

This paper contains 22 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Row 1: Sample 5-channel image from BRATS20 data. Row 2: Sample 5-channel image from BRATS21 data. Row 3: synthetic 5-channel image from StyleGAN trained with BRATS20. Row 4: synthetic 5-channel image from StyleGAN trained with BRATS21. Row 5: synthetic 5-channel image from diffusion model trained with BRATS20. Row 6: synthetic 5-channel image from diffusion model trained with BRATS21.
  • Figure 2: A comparison of randomly selected synthetic images and the training image with the highest correlation. The first column displays the T1wGd training slice with the highest correlation, while the second column presents the corresponding synthesized image. The third column shows a scatter plot comparing the two images. The fourth column features the previous adjacent slice of the training T1wGd image, and the fifth and final column displays the next adjacent T1wGd slice of the training image. Rowwise the first row showcases samples from StyleGAN trained on BRATS20 with a correlation of 0.96247, and the second row displays samples trained on BRATS21 with a correlation of 0.92620. The third row presents samples from a diffusion model trained on BRATS20 with a correlation of 0.99002, while the last row exhibits samples from a diffusion model trained on BRATS21 with a correlation of 0.97655. The synthetic images from the diffusion model are more or less copies of a specific training image.
  • Figure 3: First row: Comparison of memorization measured as the highest correlation between 1000 randomly selected synthetic images from StyleGAN and a diffusion model and all training images. As a baseline the same comparison is done between all test images and all training images. Clearly, the diffusion model is more prone to memorization, compared to StyleGAN. Second row: A direct comparison when using BRATS20 or BRATS21 for training. Clearly, the diffusion model is more likely to memorize the training images if the training set is smaller. Third row: The same correlation analysis between synthetic images and the test images. In general the correlations are lower, but the diffusion model still produces higher correlations.
  • Figure 4: A comparison of real and synthetic images. Row 1 presents samples from the training data. Row 2 showcases synthetic images generated using StyleGAN. Row 3 depicts synthetic images produced by the diffusion model.
  • Figure 5: Comparison between synthetic images and their closest training image. Row 1 displays a StyleGAN-generated image and the training image with the highest correlation of 0.87679 from the CXR-5216 dataset. Row 2 showcases an image produced by a diffusion model, also trained on the CXR-5216 dataset, with a correlation of 0.90564, alongside the closest training image. Rows 3 and 4 show the same comparison using the smaller CXR-1300 dataset, with a correlation of 0.89733 for the StyleGAN-generated image and 0.92314 for the diffusion model image. Accompanying these images are scatter plots for each of the four rows, providing a visual representation of the correlations.
  • ...and 1 more figures