Table of Contents
Fetching ...

Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation

Kathrin Khadra, Utku Türkbey

TL;DR

This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model for generating synthetic medical images, focusing on CT scans with lung nodules, which generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints, enabling the creation of training datasets.

Abstract

The scarcity of publicly available medical imaging data limits the development of effective AI models. This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images, focusing on CT scans with lung nodules. Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints, enabling the creation of training datasets. We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images. In the first case, models trained solely on synthetic data achieve Dice scores comparable to those trained on real-world data benchmarks. In the second case, augmenting real-world data with synthetic images significantly improves segmentation performance. The generated images demonstrate their potential to enhance medical image datasets in scenarios with limited real-world data.

Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation

TL;DR

This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model for generating synthetic medical images, focusing on CT scans with lung nodules, which generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints, enabling the creation of training datasets.

Abstract

The scarcity of publicly available medical imaging data limits the development of effective AI models. This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images, focusing on CT scans with lung nodules. Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints, enabling the creation of training datasets. We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images. In the first case, models trained solely on synthetic data achieve Dice scores comparable to those trained on real-world data benchmarks. In the second case, augmenting real-world data with synthetic images significantly improves segmentation performance. The generated images demonstrate their potential to enhance medical image datasets in scenarios with limited real-world data.

Paper Structure

This paper contains 14 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The experiment setup divides the LIDC-IDRI data into training, validation, and test data. From the training and validation data, a benchmark model is trained and the training data is used to train the diffusion model. Then a segmentation model is trained on synthetic data only (green). Additionally, based on the worst-performing validation data of the benchmark segmentation model, synthetic data is generated (orange) and a segmentation model is trained on real-world and the targeted synthetic data. All models are tested on the same real-world test set and DSCs are obtained.
  • Figure 2: Generated images for a respective segmentation mask show that the model can hold the condition while generating an image.
  • Figure 3: Difference in DSC between the Real and Real+Synthetic segmentations for the worst predicted validation samples, the other validation samples, and the test samples. The worst predicted validation CTs could be improved or did not change in value. For the other validation samples and the test samples the majority of the samples did not change or improve, showing an overall performance improvement of the classifier.
  • Figure 4: Validation CT Image with three nodules and the $5$ predictions of the Real and Real+Synthetic segmentation model. The Real model was only able to segment two. After adding the synthetic data the model could predict all 3.
  • Figure 5: Validation CT Image with a nodule attached to the pleura and the $5$ predictions of the Real and Real+Synthetic segmentation model. The Real model was not able to distinguish nodule and pleura well. After adding the synthetic data the segmentation improved.
  • ...and 3 more figures