Table of Contents
Fetching ...

M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews

TL;DR

M&M tackles false positives in screening mammography by modeling three clinical realities: a single malignant finding per image, dual-view CC and MLO exams, and a predominance of negative images. It introduces an end-to-end framework combining Sparse R-CNN with dual classification heads, a cross-view attention module, and MIL to train on unannotated images, using MIL aggregation such as NoisyOR to derive image- and breast-level predictions. The method achieves strong detection and breast-level classification performance across five datasets, with recall at 0.1 FP/image of 87.7% on OPTIMAM and a FP-gap reduction to 3.5 points, along with high breast AUCs. The work highlights the value of sparsity, cross-view reasoning, and MIL for clinically relevant mammography analysis and demonstrates practical improvements over dense detectors.

Abstract

Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike natural images, a malignant mammogram typically contains only one malignant finding; (2) mammography exams contain two views of each breast, and both views ought to be considered to make a correct assessment; (3) most mammograms are negative and do not contain any findings. In this work, we tackle the three aforementioned challenges by: (1) leveraging Sparse R-CNN and showing that sparse detectors are more appropriate than dense detectors for mammography; (2) including a multi-view cross-attention module to synthesize information from different views; (3) incorporating multi-instance learning (MIL) to train with unannotated images and perform breast-level classification. The resulting model, M&M, is a Multi-view and Multi-instance learning system that can both localize malignant findings and provide breast-level predictions. We validate M&M's detection and classification performance using five mammography datasets. In addition, we demonstrate the effectiveness of each proposed component through comprehensive ablation studies.

M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

TL;DR

M&M tackles false positives in screening mammography by modeling three clinical realities: a single malignant finding per image, dual-view CC and MLO exams, and a predominance of negative images. It introduces an end-to-end framework combining Sparse R-CNN with dual classification heads, a cross-view attention module, and MIL to train on unannotated images, using MIL aggregation such as NoisyOR to derive image- and breast-level predictions. The method achieves strong detection and breast-level classification performance across five datasets, with recall at 0.1 FP/image of 87.7% on OPTIMAM and a FP-gap reduction to 3.5 points, along with high breast AUCs. The work highlights the value of sparsity, cross-view reasoning, and MIL for clinically relevant mammography analysis and demonstrates practical improvements over dense detectors.

Abstract

Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike natural images, a malignant mammogram typically contains only one malignant finding; (2) mammography exams contain two views of each breast, and both views ought to be considered to make a correct assessment; (3) most mammograms are negative and do not contain any findings. In this work, we tackle the three aforementioned challenges by: (1) leveraging Sparse R-CNN and showing that sparse detectors are more appropriate than dense detectors for mammography; (2) including a multi-view cross-attention module to synthesize information from different views; (3) incorporating multi-instance learning (MIL) to train with unannotated images and perform breast-level classification. The resulting model, M&M, is a Multi-view and Multi-instance learning system that can both localize malignant findings and provide breast-level predictions. We validate M&M's detection and classification performance using five mammography datasets. In addition, we demonstrate the effectiveness of each proposed component through comprehensive ablation studies.
Paper Structure (15 sections, 4 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 15 sections, 4 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Two gaps between deep learning literature and clinical applicability. (a) Few works report detailed performance in the clinically relevant region of less than 1 FP/image. M&M surpasses previous works by a large margin in this region. (b) Typical evaluation datasets are not representative: they contain from zero (CBIS-DDSM lee2017curated) to few negative cases (DDSM ddsm, INBreast moreira2012inbreast). To illustrate the distribution shift, we train four popular dense detectors using a standard setup that includes only annotated malignant and benign cases agarwal2019automaticcvr-rcnnbg-rcnn. We utilize OPTIMAM optimam, a large dataset with a significant proportion of negatives (\ref{['tab:dataset']}), for training and evaluation. Across all dense models, there is a large performance drop in the clinically representative setting that includes negative images. This means that the dense models are producing too many FPs on negative images. Our model, M&M, successfully tackles this performance gap.
  • Figure 2: M&M tackles false positives through (1, blue, dotted arrows) leveraging the Sparse R-CNN cascade architecture to iteratively refine sparse learnable proposals into predictions, (2, red, solid arrows) incorporating a cross-attention module to reason about relations between objects across two views, and (3, green, dashed arrows) utilizing image and breast MIL pooling to train with images that do not have lesion annotations.
  • Figure 3: Qualitative Evaluation. Left: Model without multi-view (row 4 of \ref{['tab:component_ablation']}) produces a loose box on the CC view and misses the finding on the MLO view. Right: M&M produces tight boxes around the finding in both views.
  • Figure 4: Effect of M&M's components on classification and detection performance.
  • Figure A1: Additional Qualitative Evaluation. Left: without multi-view, the model misses a mass on the CC view even though it was able to detect the mass on the MLO view. Right: with multi-view, M&M recalls the mass on both views.