Table of Contents
Fetching ...

A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology

Hassan Keshvarikhojasteh, Mihail Tifrea, Sibylle Hess, Josien P. W. Pluim, Mitko Veta

TL;DR

The paper addresses weakly supervised WSI classification in digital pathology by enhancing ABMIL with explicit patch interactions. It introduces Global ABMIL (GABMIL) and the Spatial Information Mixing Module (SIMM), which uses BLOCK and GRID MLPMixer-style attention to encode local and global spatial dependencies while keeping ABMIL's efficiency. Across two public datasets (TCGA BRCA and TCGA LUNG), GABMIL consistently outperforms ABMIL and matches or surpasses Transformer-based TransMIL with substantially lower computational cost. The findings highlight the importance of modeling patch interactions in MIL for pathology and suggest directions for integrating spatial mixing directly into ABMIL attention mechanisms. The work provides a practical, scalable approach for improved WSI subtyping in breast and lung cancers, with code available online.

Abstract

Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images (WSIs). However, conventional MIL methods such as Attention-Based Deep Multiple Instance Learning (ABMIL) typically disregard spatial interactions among patches that are crucial to pathological diagnosis. Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships. However, it remains unclear whether explicitly modeling patch relationships yields similar performance gains in ABMIL, which relies solely on Multi-Layer Perceptrons (MLPs). In contrast, TransMIL employs Transformer-based layers, introducing a fundamental architectural shift at the cost of substantially increased computational complexity. In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question. Our proposed model, Global ABMIL (GABMIL), explicitly captures inter-instance dependencies while preserving computational efficiency. Experimental results on two publicly available datasets for tumor subtyping in breast and lung cancers demonstrate that GABMIL achieves up to a 7 percentage point improvement in AUPRC and a 5 percentage point increase in the Kappa score over ABMIL, with minimal or no additional computational overhead. These findings underscore the importance of incorporating patch interactions within MIL frameworks. Our code is available at \href{https://github.com/tueimage/GABMIL}{\texttt{GABMIL}}.

A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology

TL;DR

The paper addresses weakly supervised WSI classification in digital pathology by enhancing ABMIL with explicit patch interactions. It introduces Global ABMIL (GABMIL) and the Spatial Information Mixing Module (SIMM), which uses BLOCK and GRID MLPMixer-style attention to encode local and global spatial dependencies while keeping ABMIL's efficiency. Across two public datasets (TCGA BRCA and TCGA LUNG), GABMIL consistently outperforms ABMIL and matches or surpasses Transformer-based TransMIL with substantially lower computational cost. The findings highlight the importance of modeling patch interactions in MIL for pathology and suggest directions for integrating spatial mixing directly into ABMIL attention mechanisms. The work provides a practical, scalable approach for improved WSI subtyping in breast and lung cancers, with code available online.

Abstract

Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images (WSIs). However, conventional MIL methods such as Attention-Based Deep Multiple Instance Learning (ABMIL) typically disregard spatial interactions among patches that are crucial to pathological diagnosis. Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships. However, it remains unclear whether explicitly modeling patch relationships yields similar performance gains in ABMIL, which relies solely on Multi-Layer Perceptrons (MLPs). In contrast, TransMIL employs Transformer-based layers, introducing a fundamental architectural shift at the cost of substantially increased computational complexity. In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question. Our proposed model, Global ABMIL (GABMIL), explicitly captures inter-instance dependencies while preserving computational efficiency. Experimental results on two publicly available datasets for tumor subtyping in breast and lung cancers demonstrate that GABMIL achieves up to a 7 percentage point improvement in AUPRC and a 5 percentage point increase in the Kappa score over ABMIL, with minimal or no additional computational overhead. These findings underscore the importance of incorporating patch interactions within MIL frameworks. Our code is available at \href{https://github.com/tueimage/GABMIL}{\texttt{GABMIL}}.

Paper Structure

This paper contains 11 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An overview of our GABMIL method. We first divide the input WSI into patches and extract their corresponding features using a pretrained model. The Spatial Information Mixing Module (SIMM) then integrates spatial information into the feature representations. Finally, the ABMIL model predicts the slide-level label.
  • Figure 2: (a) Illustration of the SIMM (BOTH configuration). Patch features are repositioned according to their original spatial arrangement. The BLOCK and GRID attention modules are then applied sequentially to integrate spatial information into the feature representations. (b) The BLOCK attention module captures spatial information within partitioned windows using a MLP layer. (c) The GRID attention module models spatial information within each partitioned grid using a MLP layer.
  • Figure 3: Representation of a slide from TCGA BRCA and its corresponding average slide-level feature maps across channels using different attention modules. From left to right: the original slide, the slide-level feature map generated using ImageNet pretrained extracted patch features, the output slide-level feature map after applying the corresponding attention module, and the contextualized slide-level feature map after the residual connection is applied.
  • Figure 4: AUPRC scores of the three design variants of SIMM (BLOCK, GRID, and BOTH ) on TCGA BRCA dataset using ImageNet pretrained extracted features.