Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images
Muhammad Usman Akbar, Wuhao Wang, Anders Eklund
TL;DR
The paper addresses the risk of memorization in diffusion models when synthesizing medical images and compares them with StyleGAN on BRATS brain MRI and chest X-ray pneumonia data. By measuring pixel-wise correlations between synthetic outputs and all training images, the study finds diffusion models memorize training data more readily than GANs, especially with small datasets and 2D slices from 3D volumes. This memorization has privacy implications for sharing synthetic medical data and highlights the inadequacy of FID/IS alone to assess risk. The authors discuss data-size effects, model-size effects, and potential strategies to detect and mitigate memorization in medical applications.
Abstract
Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and a diffusion model, using BRATS20, BRATS21 and a chest x-ray pneumonia dataset, to synthesize brain MRI and chest x-ray images, and measure the correlation between the synthetic images and all training images. Our results show that diffusion models are more likely to memorize the training images, compared to StyleGAN, especially for small datasets and when using 2D slices from 3D volumes. Researchers should be careful when using diffusion models (and to some extent GANs) for medical imaging, if the final goal is to share the synthetic images.
