Table of Contents
Fetching ...

Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography

Shilong Yang, Chulong Zhang, Qi Zang, Juan Yu, Liang Zeng, Xiao Luo, Yexuan Xing, Xin Pan, Qi Li, Xiaokun Liang, Yaoqin Xie

TL;DR

This paper tackles the difficulty of detecting and localizing breast cancer in high-resolution, multi-view mammograms by replacing CNN/ViT-centric pipelines with a Context Clustering-based framework. It introduces Tri-level Information Fusion (TIFF), combining global, feature-based local, and patch-based local information, and employs weakly supervised, multi-view learning to localize lesions with minimal annotation. Across Vindr-Mammo and CBIS-DDSM, the method achieves state-of-the-art AUC (0.828 and 0.805 respectively) with strong localization and competitive model efficiency, validated through comprehensive ablations and ROC analyses. The approach promises scalable, cost-effective screening in clinical settings by leveraging context clustering, patch-based ROI selection, and attention-guided fusion to maximize information utilization from very high-resolution mammograms.

Abstract

Breast cancer is a significant global health issue, and the diagnosis of breast imaging has always been challenging. Mammography images typically have extremely high resolution, with lesions occupying only a very small area. Down-sampling in neural networks can easily lead to the loss of microcalcifications or subtle structures, making it difficult for traditional neural network architectures to address these issues. To tackle these challenges, we propose a Context Clustering Network with triple information fusion. Firstly, compared to CNNs or transformers, we find that Context clustering methods (1) are more computationally efficient and (2) can more easily associate structural or pathological features, making them suitable for the clinical tasks of mammography. Secondly, we propose a triple information fusion mechanism that integrates global information, feature-based local information, and patch-based local information. The proposed approach is rigorously evaluated on two public datasets, Vindr-Mammo and CBIS-DDSM, using five independent splits to ensure statistical robustness. Our method achieves an AUC of 0.828 on Vindr-Mammo and 0.805 on CBIS-DDSM, outperforming the next best method by 3.1% and 2.4%, respectively. These improvements are statistically significant (p<0.05), underscoring the benefits of Context Clustering Network with triple information fusion. Overall, our Context Clustering framework demonstrates strong potential as a scalable and cost-effective solution for large-scale mammography screening, enabling more efficient and accurate breast cancer detection. Access to our method is available at https://github.com/Sohyu1/Mammo_Clustering.

Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography

TL;DR

This paper tackles the difficulty of detecting and localizing breast cancer in high-resolution, multi-view mammograms by replacing CNN/ViT-centric pipelines with a Context Clustering-based framework. It introduces Tri-level Information Fusion (TIFF), combining global, feature-based local, and patch-based local information, and employs weakly supervised, multi-view learning to localize lesions with minimal annotation. Across Vindr-Mammo and CBIS-DDSM, the method achieves state-of-the-art AUC (0.828 and 0.805 respectively) with strong localization and competitive model efficiency, validated through comprehensive ablations and ROC analyses. The approach promises scalable, cost-effective screening in clinical settings by leveraging context clustering, patch-based ROI selection, and attention-guided fusion to maximize information utilization from very high-resolution mammograms.

Abstract

Breast cancer is a significant global health issue, and the diagnosis of breast imaging has always been challenging. Mammography images typically have extremely high resolution, with lesions occupying only a very small area. Down-sampling in neural networks can easily lead to the loss of microcalcifications or subtle structures, making it difficult for traditional neural network architectures to address these issues. To tackle these challenges, we propose a Context Clustering Network with triple information fusion. Firstly, compared to CNNs or transformers, we find that Context clustering methods (1) are more computationally efficient and (2) can more easily associate structural or pathological features, making them suitable for the clinical tasks of mammography. Secondly, we propose a triple information fusion mechanism that integrates global information, feature-based local information, and patch-based local information. The proposed approach is rigorously evaluated on two public datasets, Vindr-Mammo and CBIS-DDSM, using five independent splits to ensure statistical robustness. Our method achieves an AUC of 0.828 on Vindr-Mammo and 0.805 on CBIS-DDSM, outperforming the next best method by 3.1% and 2.4%, respectively. These improvements are statistically significant (p<0.05), underscoring the benefits of Context Clustering Network with triple information fusion. Overall, our Context Clustering framework demonstrates strong potential as a scalable and cost-effective solution for large-scale mammography screening, enabling more efficient and accurate breast cancer detection. Access to our method is available at https://github.com/Sohyu1/Mammo_Clustering.
Paper Structure (36 sections, 8 equations, 6 figures, 6 tables)

This paper contains 36 sections, 8 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Context Clustering Visualization Diagram. Figures a to d show that the left half of each image shows the original mammogram with annotated suspicious lesions, while the right half presents Contextual clustering visualization, akin to a CNN heatmap and a VIT attention map, with the suspicious lesion locations also outlined. This figure clearly shows that the Context clustering approach effectively identifies and groups suspicious lesion areas in mammography.
  • Figure 2: Architecture of the proposed model. Images from four perspectives are enhanced into point sets and processed via a multi-level Context Clustering module, Global Coc, to extract global information. This module includes point reducers and context cluster blocks. The ROISelectModel utilizes this global information to select patch-based images, which are processed through another Context Clustering module, Local Coc, to generate patch-based local information. This is fused with feature-based local information derived from the global information to produce local information. Subsequently, local and global information are combined to create single-view fusion information. Fusion information from each perspective is integrated across views and regressed to produce the final output.
  • Figure 3: A visual explanation of Context-Clustering. This clustering consists of five components: selecting central anchor points, identifying neighbors for each anchor, calculating features for each anchor, performing similarity analysis based on these anchors, and representing all clusters on the chart.
  • Figure 4: Visualization of patch-based images extracted by the model. The green box on the mammography indicates the location of the suspicious lesion, while the blue box represents the patch-based images selected by the model. We can observe that the model’s extracted patch-based images perform exceptionally well, and the magnified images clearly show calcifications and masses.
  • Figure 5: Comparison of ROC curves of different models on two public datasets. Figures a and b compare the ROC curves of our model with other Single-view and Multi-view architectures on the Vindr-mammo dataset. Figures c and d present the ROC curves comparison on the CBIS-DDSM dataset.
  • ...and 1 more figures