Table of Contents
Fetching ...

GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks

Christopher Bowles, Liang Chen, Ricardo Guerrero, Paul Bentley, Roger Gunn, Alexander Hammers, David Alexander Dickie, Maria Valdés Hernández, Joanna Wardlaw, Daniel Rueckert

TL;DR

This study addresses the common problem of limited labeled medical imaging data by using Generative Adversarial Networks to create synthetic training samples. A GAN is trained on joint image–label patches to model the data distribution and generate augmented data, which is then used to train segmentation networks for brain CT and MR tasks. The results show modest yet consistent improvements in Dice Similarity Coefficient, particularly when real data are scarce, and reveal that GAN augmentation can complement traditional augmentation methods. The findings suggest GAN-based augmentation is a practical, low-overhead strategy to reduce overfitting and improve generalization, with potential for broader application across datasets and architectures.

Abstract

One of the biggest issues facing the use of machine learning in medical imaging is the lack of availability of large, labelled datasets. The annotation of medical images is not only expensive and time consuming but also highly dependent on the availability of expert observers. The limited amount of training data can inhibit the performance of supervised machine learning algorithms which often need very large quantities of data on which to train to avoid overfitting. So far, much effort has been directed at extracting as much information as possible from what data is available. Generative Adversarial Networks (GANs) offer a novel way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images. This paper demonstrates the feasibility of introducing GAN derived synthetic data to the training datasets in two brain segmentation tasks, leading to improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage points under different conditions, with the strongest effects seen fewer than ten training image stacks are available.

GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks

TL;DR

This study addresses the common problem of limited labeled medical imaging data by using Generative Adversarial Networks to create synthetic training samples. A GAN is trained on joint image–label patches to model the data distribution and generate augmented data, which is then used to train segmentation networks for brain CT and MR tasks. The results show modest yet consistent improvements in Dice Similarity Coefficient, particularly when real data are scarce, and reveal that GAN augmentation can complement traditional augmentation methods. The findings suggest GAN-based augmentation is a practical, low-overhead strategy to reduce overfitting and improve generalization, with potential for broader application across datasets and architectures.

Abstract

One of the biggest issues facing the use of machine learning in medical imaging is the lack of availability of large, labelled datasets. The annotation of medical images is not only expensive and time consuming but also highly dependent on the availability of expert observers. The limited amount of training data can inhibit the performance of supervised machine learning algorithms which often need very large quantities of data on which to train to avoid overfitting. So far, much effort has been directed at extracting as much information as possible from what data is available. Generative Adversarial Networks (GANs) offer a novel way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images. This paper demonstrates the feasibility of introducing GAN derived synthetic data to the training datasets in two brain segmentation tasks, leading to improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage points under different conditions, with the strongest effects seen fewer than ten training image stacks are available.

Paper Structure

This paper contains 9 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Examples of real and generated synthetic patches. Left . Red: Cortical . Green: Brain stem . Blue: Ventricular . Right: .
  • Figure 2: segmentation on CT:Left: Average for each class (coloured) and mean across classes (black) as availability of real data varies. Solid lines show performance without augmentation, dashed lines show performance with +50% synthetic data, and dot/dashed lines show the difference, indicating the improvement seen with augmentation. Right: Average observed using a UNet as synthetic data is added, when 100%, 50% and 10% of the total amount of real data is used. Each coloured dot represents an experiment. Black circles show the mean with filled circles indicating results significantly different from the baseline.
  • Figure 3: Synthetic images (top of pair) with their nearest neighbours in the training set (bottom of pair) from trained on patches from 5, 25 and 50 real images. Some local signs of successful augmentation are indicated using green (same lesions, different anatomy) and yellow (same anatomy, different lesions) arrows, and novel images (new anatomy and lesions) are shown with blue dots.