Table of Contents
Fetching ...

Radio-opaque artefacts in digital mammography: automatic detection and analysis of downstream effects

Amelia Schueppert, Ben Glocker, Mélanie Roschewitz

TL;DR

Investigates how radio-opaque artefacts in mammography affect machine learning classifiers used for cancer screening and density assessment. Builds a large, manually annotated artefact dataset (22,012 images) and trains a multi-label detector with ResNet-50 to identify five artefact types. Demonstrates artefacts are prevalent (≈22% of images) and can significantly degrade downstream task performance and shift output distributions and thresholds. All annotations, code, and predictions are released to facilitate robust evaluation and bias-aware development of mammography AI systems.

Abstract

This study investigates the effects of radio-opaque artefacts, such as skin markers, breast implants, and pacemakers, on mammography classification models. After manually annotating 22,012 mammograms from the publicly available EMBED dataset, a robust multi-label artefact detector was developed to identify five distinct artefact types (circular and triangular skin markers, breast implants, support devices and spot compression structures). Subsequent experiments on two clinically relevant tasks $-$ breast density assessment and cancer screening $-$ revealed that these artefacts can significantly affect model performance, alter classification thresholds, and distort output distributions. These findings underscore the importance of accurate automatic artefact detection for developing reliable and robust classification models in digital mammography. To facilitate future research our annotations, code, and model predictions are made publicly available.

Radio-opaque artefacts in digital mammography: automatic detection and analysis of downstream effects

TL;DR

Investigates how radio-opaque artefacts in mammography affect machine learning classifiers used for cancer screening and density assessment. Builds a large, manually annotated artefact dataset (22,012 images) and trains a multi-label detector with ResNet-50 to identify five artefact types. Demonstrates artefacts are prevalent (≈22% of images) and can significantly degrade downstream task performance and shift output distributions and thresholds. All annotations, code, and predictions are released to facilitate robust evaluation and bias-aware development of mammography AI systems.

Abstract

This study investigates the effects of radio-opaque artefacts, such as skin markers, breast implants, and pacemakers, on mammography classification models. After manually annotating 22,012 mammograms from the publicly available EMBED dataset, a robust multi-label artefact detector was developed to identify five distinct artefact types (circular and triangular skin markers, breast implants, support devices and spot compression structures). Subsequent experiments on two clinically relevant tasks breast density assessment and cancer screening revealed that these artefacts can significantly affect model performance, alter classification thresholds, and distort output distributions. These findings underscore the importance of accurate automatic artefact detection for developing reliable and robust classification models in digital mammography. To facilitate future research our annotations, code, and model predictions are made publicly available.
Paper Structure (12 sections, 5 figures, 3 tables)

This paper contains 12 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Radio-opaque artefacts considered in this study: skin markers (circle and triangles), breast implants, support devices (e.g. pacemakers) and spot compression (or magnification) devices. Red arrows highlight artefacts of interest.
  • Figure 2: Artefact detection performance: excellent detection performance (balanced accuracy $>$ 98%) for all artefacts.
  • Figure 3: Confusion matrices for breast cancer screening, per marker subgroup. The classification threshold is fixed across all subgroups, chosen such that the global sensitivity and specificity are equal. The sensitivity-specificity balance on images with triangle markers, breast implants and support devices is substantially degraded, where images with those artefacts have a very low specificity compared to images without markers.
  • Figure 4: Effect of artefacts on model output distribution for the breast cancer screening model. From top to bottom, effect of: triangular skin markers, breast implants and devices.
  • Figure 5: Confusion matrices for breast density classification, per marker subgroup. Class-wise accuracies are substantially shifted on images with breast implants and devices.