Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography
Shilong Yang, Chulong Zhang, Qi Zang, Juan Yu, Liang Zeng, Xiao Luo, Yexuan Xing, Xin Pan, Qi Li, Xiaokun Liang, Yaoqin Xie
TL;DR
This paper tackles the difficulty of detecting and localizing breast cancer in high-resolution, multi-view mammograms by replacing CNN/ViT-centric pipelines with a Context Clustering-based framework. It introduces Tri-level Information Fusion (TIFF), combining global, feature-based local, and patch-based local information, and employs weakly supervised, multi-view learning to localize lesions with minimal annotation. Across Vindr-Mammo and CBIS-DDSM, the method achieves state-of-the-art AUC (0.828 and 0.805 respectively) with strong localization and competitive model efficiency, validated through comprehensive ablations and ROC analyses. The approach promises scalable, cost-effective screening in clinical settings by leveraging context clustering, patch-based ROI selection, and attention-guided fusion to maximize information utilization from very high-resolution mammograms.
Abstract
Breast cancer is a significant global health issue, and the diagnosis of breast imaging has always been challenging. Mammography images typically have extremely high resolution, with lesions occupying only a very small area. Down-sampling in neural networks can easily lead to the loss of microcalcifications or subtle structures, making it difficult for traditional neural network architectures to address these issues. To tackle these challenges, we propose a Context Clustering Network with triple information fusion. Firstly, compared to CNNs or transformers, we find that Context clustering methods (1) are more computationally efficient and (2) can more easily associate structural or pathological features, making them suitable for the clinical tasks of mammography. Secondly, we propose a triple information fusion mechanism that integrates global information, feature-based local information, and patch-based local information. The proposed approach is rigorously evaluated on two public datasets, Vindr-Mammo and CBIS-DDSM, using five independent splits to ensure statistical robustness. Our method achieves an AUC of 0.828 on Vindr-Mammo and 0.805 on CBIS-DDSM, outperforming the next best method by 3.1% and 2.4%, respectively. These improvements are statistically significant (p<0.05), underscoring the benefits of Context Clustering Network with triple information fusion. Overall, our Context Clustering framework demonstrates strong potential as a scalable and cost-effective solution for large-scale mammography screening, enabling more efficient and accurate breast cancer detection. Access to our method is available at https://github.com/Sohyu1/Mammo_Clustering.
