Table of Contents
Fetching ...

Region-of-Interest Augmentation for Mammography Classification under Patient-Level Cross-Validation

Farbod Bigdeli, Mohsen Mohammadagha, Ali Bigdeli

TL;DR

This work tackles the memory and data scarcity challenges in mammography classification by introducing a lightweight, training-time ROI augmentation that samples random crops from a precomputed, label-free ROI bank with optional jitter. Evaluated under strict patient-level cross-validation on Mini-DDSM, the approach yields consistent ROC-AUC gains over a strong full-image baseline without increasing inference cost or requiring extra annotations. The results emphasize a data-centric improvement to existing CNN pipelines and suggest broader applicability to other high-resolution medical images, while acknowledging limitations tied to dataset resolution and fixed ROI proposals. Overall, the method provides a practical baseline for improving performance in constrained mammography datasets and highlights the value of simple, data-centered augmentations in medical imaging.

Abstract

Breast cancer screening with mammography remains central to early detection and mortality reduction. Deep learning has shown strong potential for automating mammogram interpretation, yet limited-resolution datasets and small sample sizes continue to restrict performance. We revisit the Mini-DDSM dataset (9,684 images; 2,414 patients) and introduce a lightweight region-of-interest (ROI) augmentation strategy. During training, full images are probabilistically replaced with random ROI crops sampled from a precomputed, label-free bounding-box bank, with optional jitter to increase variability. We evaluate under strict patient-level cross-validation and report ROC-AUC, PR-AUC, and training-time efficiency metrics (throughput and GPU memory). Because ROI augmentation is training-only, inference-time cost remains unchanged. On Mini-DDSM, ROI augmentation (best: p_roi = 0.10, alpha = 0.10) yields modest average ROC-AUC gains, with performance varying across folds; PR-AUC is flat to slightly lower. These results demonstrate that simple, data-centric ROI strategies can enhance mammography classification in constrained settings without requiring additional labels or architectural modifications.

Region-of-Interest Augmentation for Mammography Classification under Patient-Level Cross-Validation

TL;DR

This work tackles the memory and data scarcity challenges in mammography classification by introducing a lightweight, training-time ROI augmentation that samples random crops from a precomputed, label-free ROI bank with optional jitter. Evaluated under strict patient-level cross-validation on Mini-DDSM, the approach yields consistent ROC-AUC gains over a strong full-image baseline without increasing inference cost or requiring extra annotations. The results emphasize a data-centric improvement to existing CNN pipelines and suggest broader applicability to other high-resolution medical images, while acknowledging limitations tied to dataset resolution and fixed ROI proposals. Overall, the method provides a practical baseline for improving performance in constrained mammography datasets and highlights the value of simple, data-centered augmentations in medical imaging.

Abstract

Breast cancer screening with mammography remains central to early detection and mortality reduction. Deep learning has shown strong potential for automating mammogram interpretation, yet limited-resolution datasets and small sample sizes continue to restrict performance. We revisit the Mini-DDSM dataset (9,684 images; 2,414 patients) and introduce a lightweight region-of-interest (ROI) augmentation strategy. During training, full images are probabilistically replaced with random ROI crops sampled from a precomputed, label-free bounding-box bank, with optional jitter to increase variability. We evaluate under strict patient-level cross-validation and report ROC-AUC, PR-AUC, and training-time efficiency metrics (throughput and GPU memory). Because ROI augmentation is training-only, inference-time cost remains unchanged. On Mini-DDSM, ROI augmentation (best: p_roi = 0.10, alpha = 0.10) yields modest average ROC-AUC gains, with performance varying across folds; PR-AUC is flat to slightly lower. These results demonstrate that simple, data-centric ROI strategies can enhance mammography classification in constrained settings without requiring additional labels or architectural modifications.

Paper Structure

This paper contains 17 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Mini-DDSM examples across class, side, and view.
  • Figure 2: Hyperparameter sweep of ROI probability and jitter (mean patient ROC–AUC).
  • Figure 3: Patient-level ROC–AUC over epochs: Full vs best ROI config, mean±SD across folds.
  • Figure 4: Throughput and peak VRAM: Full vs ROI-aug (best). Error bars omitted for clarity (averaged over folds).
  • Figure 5: Per-fold patient ROC–AUC: Full vs best ROI configuration.