From 2D to 3D Without Extra Baggage: Data-Efficient Cancer Detection in Digital Breast Tomosynthesis
Yen Nhi Truong Vu, Dan Guo, Sripad Joshi, Harshit Kumar, Jason Su, Thomas Paul Matthews
TL;DR
This paper tackles data scarcity in Digital Breast Tomosynthesis (DBT) by introducing M&M-3D, a parameter-free extension of a FFDM-pretrained 2D detector (M&M) that enables learnable 3D reasoning without adding new parameters. It constructs sparse 3D proposals spanning all DBT slices and repeatedly mixes these with slice-level features through six cascade heads, using malignancy-guided weighting to fuse slice information into a coherent 3D representation and derive implicit $z$-axis localization via $z_i = \arg\max_s w_{i,s}$. M&M-3D demonstrates strong data efficiency, outperforming 2D projection and slice-based baselines by up to 54% in localization and 10% in classification, and matching or surpassing complex 3D reasoning methods in low-data regimes while achieving state-of-the-art results on the BCS-DBT benchmark after finetuning. The approach preserves FFDM transferability, achieves high generalizability across datasets, and aligns with radiologists’ workflows by focusing supervision on the most suspicious slice for each finding. Overall, M&M-3D provides a scalable path toward unified 2D-3D learning for cancer detection in DBT with practical clinical impact and data-efficient deployment.
Abstract
Digital Breast Tomosynthesis (DBT) enhances finding visibility for breast cancer detection by providing volumetric information that reduces the impact of overlapping tissues; however, limited annotated data has constrained the development of deep learning models for DBT. To address data scarcity, existing methods attempt to reuse 2D full-field digital mammography (FFDM) models by either flattening DBT volumes or processing slices individually, thus discarding volumetric information. Alternatively, 3D reasoning approaches introduce complex architectures that require more DBT training data. Tackling these drawbacks, we propose M&M-3D, an architecture that enables learnable 3D reasoning while remaining parameter-free relative to its FFDM counterpart, M&M. M&M-3D constructs malignancy-guided 3D features, and 3D reasoning is learned through repeatedly mixing these 3D features with slice-level information. This is achieved by modifying operations in M&M without adding parameters, thus enabling direct weight transfer from FFDM. Extensive experiments show that M&M-3D surpasses 2D projection and 3D slice-based methods by 11-54% for localization and 3-10% for classification. Additionally, M&M-3D outperforms complex 3D reasoning variants by 20-47% for localization and 2-10% for classification in the low-data regime, while matching their performance in high-data regime. On the popular BCS-DBT benchmark, M&M-3D outperforms previous top baseline by 4% for classification and 10% for localization.
