Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang; Honglin Li; Yuxuan Sun; Sunyi Zheng; Chenglu Zhu; Lin Yang

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang

TL;DR

This work tackles overfitting in MIL-based whole slide image classification by addressing attention value concentration. It introduces Attention-Challenging MIL (ACMIL), which combines Multiple Branch Attention (MBA) to diversify discriminative instance patterns with $M$ branches and semantic/diversity regularization, and Stochastic Top-$K$ Instance Masking (STKIM) to temper dominance by a few highly attended instances with masking probability $p$ and Top-$K$ control. The approach builds on ABMIL and demonstrates consistent gains across CAMELYON16, BRACS, and LBC datasets using both ImageNet-pretrained ResNet-18 and SSL-pretrained ViT-S/16 backbones, supported by heatmap and UMAP visualizations illustrating reduced attention concentration and improved generalization. Ablation studies show the necessity of the diversity loss and the effectiveness of combining MBA with STKIM, while comparisons to strong baselines and additional analyses reinforce ACMIL’s interpretability and practical impact for robust WSI classification. The authors release the public code and provide comprehensive experimental details and analyses to validate the method’s benefits and limitations.

Abstract

In the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a subset of discriminative instances, which are closely linked to overfitting. To mitigate overfitting, we present Attention-Challenging MIL (ACMIL). ACMIL combines two techniques based on separate analyses for attention value concentration. Firstly, UMAP of instance features reveals various patterns among discriminative instances, with existing attention mechanisms capturing only some of them. To remedy this, we introduce Multiple Branch Attention (MBA) to capture more discriminative instances using multiple attention branches. Secondly, the examination of the cumulative value of Top-K attention scores indicates that a tiny number of instances dominate the majority of attention. In response, we present Stochastic Top-K Instance Masking (STKIM), which masks out a portion of instances with Top-K attention values and allocates their attention values to the remaining instances. The extensive experimental results on three WSI datasets with two pre-trained backbones reveal that our ACMIL outperforms state-of-the-art methods. Additionally, through heatmap visualization and UMAP visualization, this paper extensively illustrates ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge. The source code is available at \url{https://github.com/dazhangyu123/ACMIL}.

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

TL;DR

branches and semantic/diversity regularization, and Stochastic Top-

Instance Masking (STKIM) to temper dominance by a few highly attended instances with masking probability

and Top-

control. The approach builds on ABMIL and demonstrates consistent gains across CAMELYON16, BRACS, and LBC datasets using both ImageNet-pretrained ResNet-18 and SSL-pretrained ViT-S/16 backbones, supported by heatmap and UMAP visualizations illustrating reduced attention concentration and improved generalization. Ablation studies show the necessity of the diversity loss and the effectiveness of combining MBA with STKIM, while comparisons to strong baselines and additional analyses reinforce ACMIL’s interpretability and practical impact for robust WSI classification. The authors release the public code and provide comprehensive experimental details and analyses to validate the method’s benefits and limitations.

Abstract

Paper Structure (26 sections, 9 equations, 15 figures, 7 tables)

This paper contains 26 sections, 9 equations, 15 figures, 7 tables.

Introduction
Related Work
Combating Overfitting in WSI Analysis
Over-Concentration of Attention Values
Method
ABMIL for WSI Classification
Mutiple Branch Attention
Stochastic Top-K Instance Masking
Experiments
Experimental Details
WSI Classification Results
Localization Results
Ablation Study
Further Analysis
Conclusion
...and 11 more sections

Figures (15)

Figure 1: The change of validation loss and entropy of attention values throughout the training of ABMIL. The results are reported on LBC with SSL pretrained features. There exists the strong negative correlation between loss and entropy.
Figure 2: Comparison of AUC and entropy of attention values between ABMIL and ACMIL. One point denotes the result of a seed on LBC with SSL pretrained features. ACMIL achieves the higher AUC and entropy than ABMIL.
Figure 3: Motivation of MBA. UMAP visualization mcinnes2018umap of tumor instance features from CAMELYON16 'test_113' case. There are various patterns/clusters among tumor instances, and relying on one single branch tends to capture a part of clusters. Three instances are selected to exhibit their texture differences.
Figure 4: Overview of the proposed MBA (top view) and STKIM (bottom view). In the MBA, $M$ discriminative patterns are extracted from patch features using the attention operator regularized by semantic and diversity regularization terms. Then, the mean operator is applied to these $M$ pattern features to produce the bag feature, which is utilized for bag-level prediction. In the STKIM, instances with Top-K attention values are randomly masked with a probability $p$.
Figure 5: Motivation of STKIM. Accumulation of Top-K attention values. Instances with Top-K attention values occupy majority attention. Results are derived from features extracted through supervised pretraining.
...and 10 more figures

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

TL;DR

Abstract

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (15)