Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification
Daniel G. P. Petrini, Hae Yong Kim
TL;DR
This work investigates how transfer learning, image resolution, and multi-view integration affect CNN-based mammogram classification. By evaluating patch-based pretraining, diverse backbones, learn-to-resize versus fixed resizing, and single- versus two-view architectures on CBIS-DDSM and VinDr-Mammo, it achieves new state-of-the-art AUCs, notably 0.8658 with two-view fusion on CBIS-DDSM and 0.8511 on VinDr-Mammo. Key findings include: patch-based pretraining is not universally beneficial (helpful on high-quality VinDr-Mammo but not CBIS-DDSM), larger input resolutions improve performance, and two-view classifiers substantially outperform single-view approaches. These results provide concrete design recommendations for robust, scalable mammogram analysis tools and underscore the practical value of multi-view strategies in breast cancer screening. The accompanying open-source code and models facilitate reproducibility and ongoing advancement in AI-assisted mammography.
Abstract
Mammography, an X-ray-based imaging technique, remains central to the early detection of breast cancer. Recent advances in artificial intelligence have enabled increasingly sophisticated computer-aided diagnostic methods, evolving from patch-based classifiers to whole-image approaches and then to multi-view architectures that jointly analyze complementary projections. Despite this progress, several critical questions remain unanswered. In this study, we systematically investigate these issues by addressing five key research questions: (1) the role of patch classifiers in performance, (2) the transferability of natural-image-trained backbones, (3) the advantages of learn-to-resize over conventional downscaling, (4) the contribution of multi-view integration, and (5) the robustness of findings across varying image quality. Beyond benchmarking, our experiments demonstrate clear performance gains over prior work. For the CBIS-DDSM dataset, we improved single-view AUC from 0.8153 to 0.8343, and multiple-view AUC from 0.8483 to 0.8658. Using a new comparative method, we also observed a 0.0217 AUC increase when extending from single to multiple-view analysis. On the complete VinDr-Mammo dataset, the multiple-view approach further improved results, achieving a 0.0492 AUC increase over single view and reaching 0.8511 AUC overall. These results establish new state-of-the-art benchmarks, providing clear evidence of the advantages of multi-view architectures for mammogram interpretation. Beyond performance, our analysis offers principled insights into model design and transfer learning strategies, contributing to the development of more accurate and reliable breast cancer screening tools. The inference code and trained models are publicly available at https://github.com/dpetrini/multiple-view.
