A training regime to learn unified representations from complementary breast imaging modalities

Umang Sharma; Jungkyu Park; Laura Heacock; Sumit Chopra; Krzysztof Geras

A training regime to learn unified representations from complementary breast imaging modalities

Umang Sharma, Jungkyu Park, Laura Heacock, Sumit Chopra, Krzysztof Geras

TL;DR

The paper addresses the challenge of leveraging complementary information across FFDM, DBT, and synthetic mammograms (SM) to improve breast lesion detection while potentially reducing radiation and exam time. It introduces a three-stage learning framework that transfers FFDM knowledge into SM representations via modules A, B, and C, using $L_{\text{det}}$ for detection and $L_{\text{sim}}$ to align SM-derived features with FFDM representations, culminating in a fused representation $h_{\text{fused}}$ for inference with SM only. Evaluated on a large nyudbt dataset, the approach outperforms single-modality baselines and approaches an upper-bound fusion, demonstrating effective cross-modal knowledge transfer. The work has practical implications for reducing reliance on FFDM while preserving diagnostic accuracy, potentially lowering radiation exposure and examination time in breast cancer screening, and it highlights a general strategy for multimodal representation learning in medical imaging.

Abstract

Full Field Digital Mammograms (FFDMs) and Digital Breast Tomosynthesis (DBT) are the two most widely used imaging modalities for breast cancer screening. Although DBT has increased cancer detection compared to FFDM, its widespread adoption in clinical practice has been slowed by increased interpretation times and a perceived decrease in the conspicuity of specific lesion types. Specifically, the non-inferiority of DBT for microcalcifications remains under debate. Due to concerns about the decrease in visual acuity, combined DBT-FFDM acquisitions remain popular, leading to overall increased exam times and radiation dosage. Enabling DBT to provide diagnostic information present in both FFDM and DBT would reduce reliance on FFDM, resulting in a reduction in both quantities. We propose a machine learning methodology that learns high-level representations leveraging the complementary diagnostic signal from both DBT and FFDM. Experiments on a large-scale data set validate our claims and show that our representations enable more accurate breast lesion detection than any DBT- or FFDM-based model.

A training regime to learn unified representations from complementary breast imaging modalities

TL;DR

for detection and

to align SM-derived features with FFDM representations, culminating in a fused representation

for inference with SM only. Evaluated on a large nyudbt dataset, the approach outperforms single-modality baselines and approaches an upper-bound fusion, demonstrating effective cross-modal knowledge transfer. The work has practical implications for reducing reliance on FFDM while preserving diagnostic accuracy, potentially lowering radiation exposure and examination time in breast cancer screening, and it highlights a general strategy for multimodal representation learning in medical imaging.

Abstract

Paper Structure (8 sections, 2 equations, 11 figures, 2 tables)

This paper contains 8 sections, 2 equations, 11 figures, 2 tables.

Introduction
Background and Related Work
Complimentary Information Between FFDM and SM
Learning From Complementary Imaging Modalities
Experiments and Results
Discussion and Conclusion
Acknowledgments.
Disclosure of Interests.

Figures (11)

Figure 1: Examples of complementary information in FFDMs and SMs. In each pair, the right image is FFDM and the left is the corresponding SM. Red and green boxes are lesions marked by an expert and a model, respectively. A model using FFDM and a model using SM capture different sets of lesions.
Figure 2: Framework to compare FFDMs against SMs for disease detection.
Figure 3: Proposed framework to learn representations from SM encoding knowledge from FFDM. Training stages: (I) A and C are trained individually on SMs and FFDMs; (II) B is trained to produce representations of SMs that borrow knowledge from FFDMs; (III) Fused representations are obtained by concatenating the features from A and B. At inference, only SMs are used to generate the fused representations $h_{\textsc{fused}}$ which are then processed to make predictions.
Figure : (A) The fused model is able to capture lesions missed by Model$_{\textsc{sm}}$. While Model$_{\textsc{ffdm}}$ is able to capture it, the fused model never actually sees the FFDM image.
Figure S5: False positive mis-classifications made by the FFDM model only. SM image is on the left of each set, FFDM to the right.
...and 6 more figures

A training regime to learn unified representations from complementary breast imaging modalities

TL;DR

Abstract

A training regime to learn unified representations from complementary breast imaging modalities

Authors

TL;DR

Abstract

Table of Contents

Figures (11)