Analyzing Data Augmentation for Medical Images: A Case Study in Ultrasound Images

Adam Tupper; Christian Gagné

Analyzing Data Augmentation for Medical Images: A Case Study in Ultrasound Images

Adam Tupper, Christian Gagné

TL;DR

Medical-imaging data scarcity motivates systematic evaluation of augmentation strategies. The authors compare individual, paired, and random-sampling augmentations on breast lesion classification in ultrasound, using $5 \times 2$ cross-validation across BUSI and BUS-BRA datasets with a ResNet-18 baseline. They find substantial variability in the effectiveness of individual transforms, but randomly sampling from a diverse augmentation pool with TrivialAugment yields consistent, sometimes dramatic, gains up to about $10.4\%$ in balanced accuracy. The work provides practical guidance on augmentation policy, showing that random augmentation pools outperform fixed sequences and offering a blueprint for rigorous augmentation studies across modalities. Overall, it advances standardization of data augmentation in breast ultrasonography and informs augmentation strategies for broader medical-imaging tasks.

Abstract

Data augmentation is one of the most effective techniques to improve the generalization performance of deep neural networks. Yet, despite often facing limited data availability in medical image analysis, it is frequently underutilized. This appears to be due to a gap in our collective understanding of the efficacy of different augmentation techniques across medical imaging tasks and modalities. One domain where this is especially true is breast ultrasound images. This work addresses this issue by analyzing the effectiveness of different augmentation techniques for the classification of breast lesions in ultrasound images. We assess the generalizability of our findings across several datasets, demonstrate that certain augmentations are far more effective than others, and show that their usage leads to significant performance gains.

Analyzing Data Augmentation for Medical Images: A Case Study in Ultrasound Images

TL;DR

cross-validation across BUSI and BUS-BRA datasets with a ResNet-18 baseline. They find substantial variability in the effectiveness of individual transforms, but randomly sampling from a diverse augmentation pool with TrivialAugment yields consistent, sometimes dramatic, gains up to about

in balanced accuracy. The work provides practical guidance on augmentation policy, showing that random augmentation pools outperform fixed sequences and offering a blueprint for rigorous augmentation studies across modalities. Overall, it advances standardization of data augmentation in breast ultrasonography and informs augmentation strategies for broader medical-imaging tasks.

Abstract

Paper Structure (16 sections, 5 figures, 2 tables)

This paper contains 16 sections, 5 figures, 2 tables.

Introduction
Methodology
Breast Lesion Classification in Ultrasound Images
Augmentation Operations
Evaluation Protocol
Results
Augmentations Individual Effectiveness
Combining Pairs of Augmentations
TrivialAugment
Discussion
Related Work
Conclusions
Disclosure of Interests.
Illustrations of Data Augmentations
Geometric vs. Photometric Transforms for TrivialAugment
...and 1 more sections

Figures (5)

Figure 1: The change in balanced validation accuracy for each data augmentation across the three tasks.
Figure 2: Change in balanced validation accuracy using two sequential operations compared to using Operation 1 alone.
Figure 3: Change in balanced validation accuracy using TrivialAugment as the size of the augmentation pool and number of operations increased for each task.
Figure 4: An illustration of the maximum strength of each augmentation when applied to an image from the Breast Ultrasound Image (BUSI) dataset.
Figure 5: A comparison between the use of only photometric or geometric transforms with TrivialAugment.

Analyzing Data Augmentation for Medical Images: A Case Study in Ultrasound Images

TL;DR

Abstract

Analyzing Data Augmentation for Medical Images: A Case Study in Ultrasound Images

Authors

TL;DR

Abstract

Table of Contents

Figures (5)