Table of Contents
Fetching ...

Mitigating annotation shift in cancer classification using single image generative models

Marta Buetas Arcas, Richard Osuala, Karim Lekadir, Oliver Díaz

TL;DR

This work tackles annotation shift in breast cancer classification from mammography by simulating shifts via varying annotation tightness, quantifying their impact, and mitigating them with single-image generative models (SinGAN). It demonstrates that malignant-class performance is most sensitive to annotation shifts and that SinGAN-based data augmentation—especially when combined with traditional oversampling in an ensemble—substantially improves robustness, with fidelity of generated images supported by SiFID metrics. The approach requires as few as four in-domain annotations to generate diverse, in-domain variations, addressing data scarcity and class imbalance. Overall, the study shows the feasibility of one-shot generative augmentation to reduce domain shift in medical imaging and informs future strategies for robust CAD in mammography.

Abstract

Artificial Intelligence (AI) has emerged as a valuable tool for assisting radiologists in breast cancer detection and diagnosis. However, the success of AI applications in this domain is restricted by the quantity and quality of available data, posing challenges due to limited and costly data annotation procedures that often lead to annotation shifts. This study simulates, analyses and mitigates annotation shifts in cancer classification in the breast mammography domain. First, a high-accuracy cancer risk prediction model is developed, which effectively distinguishes benign from malignant lesions. Next, model performance is used to quantify the impact of annotation shift. We uncover a substantial impact of annotation shift on multiclass classification performance particularly for malignant lesions. We thus propose a training data augmentation approach based on single-image generative models for the affected class, requiring as few as four in-domain annotations to considerably mitigate annotation shift, while also addressing dataset imbalance. Lastly, we further increase performance by proposing and validating an ensemble architecture based on multiple models trained under different data augmentation regimes. Our study offers key insights into annotation shift in deep learning breast cancer classification and explores the potential of single-image generative models to overcome domain shift challenges.

Mitigating annotation shift in cancer classification using single image generative models

TL;DR

This work tackles annotation shift in breast cancer classification from mammography by simulating shifts via varying annotation tightness, quantifying their impact, and mitigating them with single-image generative models (SinGAN). It demonstrates that malignant-class performance is most sensitive to annotation shifts and that SinGAN-based data augmentation—especially when combined with traditional oversampling in an ensemble—substantially improves robustness, with fidelity of generated images supported by SiFID metrics. The approach requires as few as four in-domain annotations to generate diverse, in-domain variations, addressing data scarcity and class imbalance. Overall, the study shows the feasibility of one-shot generative augmentation to reduce domain shift in medical imaging and informs future strategies for robust CAD in mammography.

Abstract

Artificial Intelligence (AI) has emerged as a valuable tool for assisting radiologists in breast cancer detection and diagnosis. However, the success of AI applications in this domain is restricted by the quantity and quality of available data, posing challenges due to limited and costly data annotation procedures that often lead to annotation shifts. This study simulates, analyses and mitigates annotation shifts in cancer classification in the breast mammography domain. First, a high-accuracy cancer risk prediction model is developed, which effectively distinguishes benign from malignant lesions. Next, model performance is used to quantify the impact of annotation shift. We uncover a substantial impact of annotation shift on multiclass classification performance particularly for malignant lesions. We thus propose a training data augmentation approach based on single-image generative models for the affected class, requiring as few as four in-domain annotations to considerably mitigate annotation shift, while also addressing dataset imbalance. Lastly, we further increase performance by proposing and validating an ensemble architecture based on multiple models trained under different data augmentation regimes. Our study offers key insights into annotation shift in deep learning breast cancer classification and explores the potential of single-image generative models to overcome domain shift challenges.
Paper Structure (12 sections, 2 equations, 7 figures, 4 tables)

This paper contains 12 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: General pipeline of experiments. Patches from healthy and lesion samples are extracted from the BCDR dataset bcdr_dataset, with three patches at distinct zoom levels (G1, G2, G3) for each lesion. The dataset is randomly split into training, validation, and test sets using three folds for all experiments, ensuring that images from the same patient are in a single set. To augment the malignant class, patches from G1 are selected to train different SinGAN models individually, each on a distinct selected patch. Once trained, a synthetic dataset is created assembling generated samples from n SinGAN models. This synthetic dataset is incorporated to balance the training dataset.
  • Figure 2: Digital mammogram with a biopsy-proven malignant lesion and its corresponding lesion annotation mask. From the third to the fifth column, the extracted patches are depicted ranging from region-of-interest zoom level group 1 (G1) to group 2 (G2) and 3 (G3) with increasing extend of non-lesion tissue visible on the patch.
  • Figure 3: Implemented pipeline of the SinGAN framework adopted from Shaham et alSinGAN. One GAN operates at each of the $n$ different SinGAN image scales. The training process starts with the coarsest scale and progresses to the finest scale. Each GAN in the hierarchy learns to generate realistic images at its respective scale, capturing both global and local details. At each scale $s_{n}$, the image from the previous scale, $\tilde{x}_{n+1}$, is upsampled and added to the input noise map, $z_n$. The result is fed into the generator ($G_n$), whose output is the residual image $\tilde{x}_{n}$ of scale $s_{n}$ fed into discriminator $D_n$ during training and passed to the next scale during inference.
  • Figure 4: Area under the Receiver Operating Characteristic (ROC) Curve for each class computed using the One-vs-Rest (OvR) strategy for training and testing on specific varying zoom levels. The vertical bars in the plot representing the standard deviation. The results reveal varying model robustness under annotation shift.
  • Figure 5: The figure presents the original samples utilized for training each SinGAN model, alongside two synthetic samples generated by the respective trained model. There are two samples for each format (film and digital), and for each sample, its corresponding patch from zoom groups G1 and G3. For augmenting the dataset only samples generated from models trained on a sample from group 1 (G1) were used. This presentation aims to demonstrate the realism achieved by the SinGAN models in generating synthetic content.
  • ...and 2 more figures