Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang
TL;DR
This work tackles overfitting in MIL-based whole slide image classification by addressing attention value concentration. It introduces Attention-Challenging MIL (ACMIL), which combines Multiple Branch Attention (MBA) to diversify discriminative instance patterns with $M$ branches and semantic/diversity regularization, and Stochastic Top-$K$ Instance Masking (STKIM) to temper dominance by a few highly attended instances with masking probability $p$ and Top-$K$ control. The approach builds on ABMIL and demonstrates consistent gains across CAMELYON16, BRACS, and LBC datasets using both ImageNet-pretrained ResNet-18 and SSL-pretrained ViT-S/16 backbones, supported by heatmap and UMAP visualizations illustrating reduced attention concentration and improved generalization. Ablation studies show the necessity of the diversity loss and the effectiveness of combining MBA with STKIM, while comparisons to strong baselines and additional analyses reinforce ACMIL’s interpretability and practical impact for robust WSI classification. The authors release the public code and provide comprehensive experimental details and analyses to validate the method’s benefits and limitations.
Abstract
In the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a subset of discriminative instances, which are closely linked to overfitting. To mitigate overfitting, we present Attention-Challenging MIL (ACMIL). ACMIL combines two techniques based on separate analyses for attention value concentration. Firstly, UMAP of instance features reveals various patterns among discriminative instances, with existing attention mechanisms capturing only some of them. To remedy this, we introduce Multiple Branch Attention (MBA) to capture more discriminative instances using multiple attention branches. Secondly, the examination of the cumulative value of Top-K attention scores indicates that a tiny number of instances dominate the majority of attention. In response, we present Stochastic Top-K Instance Masking (STKIM), which masks out a portion of instances with Top-K attention values and allocates their attention values to the remaining instances. The extensive experimental results on three WSI datasets with two pre-trained backbones reveal that our ACMIL outperforms state-of-the-art methods. Additionally, through heatmap visualization and UMAP visualization, this paper extensively illustrates ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge. The source code is available at \url{https://github.com/dazhangyu123/ACMIL}.
