Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint

Linghan Cai; Shenjin Huang; Ye Zhang; Jinpeng Lu; Yongbing Zhang

Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint

Linghan Cai, Shenjin Huang, Ye Zhang, Jinpeng Lu, Yongbing Zhang

TL;DR

AttriMIL addresses the limitations of attention-based MIL for whole-slide pathology by introducing an explicit attribute scoring mechanism that measures per-instance contribution to bag predictions. It adds a spatial attribute constraint to model intra-slide patch correlations and an attribute ranking loss to capture inter-slide differences, all powered by a histopathology-adaptive backbone with multi-stage adapters. Across Camelyon16, TCGA-NSCLC, and UniToPatho, AttriMIL achieves state-of-the-art accuracy, F1, and AUC, with additional qualitative gains in tumor localization and robust OOD detection. The approach offers a practical, scalable pathway toward more interpretable and reliable computer-aided pathology systems.

Abstract

Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis, processing gigapixel-resolution images with slide-level labels. As pioneering efforts, attention-based MIL (ABMIL) and its variants are increasingly becoming popular due to the characteristics of simultaneously handling clinical diagnosis and tumor localization. However, the attention mechanism exhibits limitations in discriminating between instances, which often misclassifies tissues and potentially impairs MIL performance. This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues. Concretely, we dissect the calculation process of ABMIL and present an attribute scoring mechanism that measures the contribution of each instance to bag prediction effectively, quantifying instance attributes. Based on attribute quantification, we develop a spatial attribute constraint and an attribute ranking constraint to model instance correlations within and across slides, respectively. These constraints encourage the network to capture the spatial correlation and semantic similarity of instances, improving the ability of AttriMIL to distinguish tissue types and identify challenging instances. Additionally, AttriMIL employs a histopathology adaptive backbone that maximizes the pre-trained model's feature extraction capability for collecting pathological features. Extensive experiments on three public benchmarks demonstrate that our AttriMIL outperforms existing state-of-the-art frameworks across multiple evaluation metrics. The implementation code is available at https://github.com/MedCAI/AttriMIL.

Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint

TL;DR

Abstract

Paper Structure (29 sections, 12 equations, 8 figures, 3 tables)

This paper contains 29 sections, 12 equations, 8 figures, 3 tables.

Introduction
Related Work
Instance-based MIL on WSIs
Bag-based MIL on WSIs
Parameter-efficient Fine-tuning
Methodology
Preliminaries
Formulate MIL
Revisit ABMIL
Attribute Scoring Mechanism
Spatial Attribute Constraint
Attribute Ranking Constraint
Histopathology Adaptive Backbone
Experiments
Datasets and Evaluation Metrics
...and 14 more sections

Figures (8)

Figure 1: Illustration of ABMIL and our AttriMIL. For features, red cuboids denote positive attributes and blue cuboids are negative; for scores, redder colors represent higher scores and bluer colors indicate lower ones. Notably, for a positive bag, existing methods abmilclamwang2023hardqu2022dgmilzhang2022dtfd generally believe the instances with high attention scores are positive instances.
Figure 2: Overview of AttriMIL framework. For an input WSI, AttriMIL crops it into patches and adopts a histopathology adaptive backbone to extract instance features. Afterward, AttriMIL generates instance attribute scores in each subtype branch (tumor and normal in the tumor detection task) using a multi-class attribute scoring mechanism. For a subtype branch, it considers WSIs of the same subtype as it as positive and WSIs of the other subtypes as negative. Spatial attribute constraint ("$\nabla$" is a differential operation) and attribute ranking constraint ("$+$" denotes a weighted sum operation) are applied in the training stage. Next, AttriMIL performs score aggregation to obtain $\text{C}$ bag scores and then generates bag prediction probabilities. Instance attribute scores corresponding to the bag prediction are mapped for tumor localization.
Figure 3: Illustration of histopathology adaptive backbone. In contrast to previous solutions, the histopathology adaptive backbone adopts feature adapters at different stages of the pre-trained network. For the adapter, the global pooling is used when training the current adapter.
Figure 4: Training loss curves and AUC changes of the validation set under different loss constraints on the Camelyon16 dataset. $\alpha$ is set as 0.1 for the spatial attribute loss and $\beta$ is set to 0.001 for the attribute ranking loss.
Figure 5: Visual analyses of the spatial attribute constraint and attribute ranking constraint. (a) presents a WSI from the Camelyon16 testing set, with tumor regions surrounded by yellow curves. (b)-(e) zoomed in area in the boxes of (a). In (b)-(e), tumor areas identified by the model (instance attribute score is greater than 0 in the tumor branch) are highlighted in red.
...and 3 more figures

Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint

TL;DR

Abstract

Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint

Authors

TL;DR

Abstract

Table of Contents

Figures (8)