MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer
Sushmita Sarker, Prithul Sarker, George Bebis, Alireza Tavakkoli
TL;DR
MV-Swin-T addresses the lack of multi-view modelling in mammography by employing a pure transformer-based network that fuses ipsilateral CC and MLO views using a Multi-Head Dynamic Attention (MDA) mechanism within fixed and shifted windows. The Omni-Attention blocks enable both self- and cross-view interactions in local windows, with fusion after stage 2 to balance context and efficiency. Evaluations on CBIS-DDSM and VinDr-Mammo show MV-Swin-T outperforms the single-view Swin-T, particularly on VinDr-Mammo and with 384×384 inputs, demonstrating the viability of fully transformer-based multi-view mammography. The work suggests future directions toward scalability to larger datasets and smoother clinical integration.
Abstract
Traditional deep learning approaches for breast cancer classification has predominantly concentrated on single-view analysis. In clinical practice, however, radiologists concurrently examine all views within a mammography exam, leveraging the inherent correlations in these views to effectively detect tumors. Acknowledging the significance of multi-view analysis, some studies have introduced methods that independently process mammogram views, either through distinct convolutional branches or simple fusion strategies, inadvertently leading to a loss of crucial inter-view correlations. In this paper, we propose an innovative multi-view network exclusively based on transformers to address challenges in mammographic image classification. Our approach introduces a novel shifted window-based dynamic attention block, facilitating the effective integration of multi-view information and promoting the coherent transfer of this information between views at the spatial feature map level. Furthermore, we conduct a comprehensive comparative analysis of the performance and effectiveness of transformer-based models under diverse settings, employing the CBIS-DDSM and Vin-Dr Mammo datasets. Our code is publicly available at https://github.com/prithuls/MV-Swin-T
