Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Martim Afonso; Praphulla M. S. Bhawsar; Monjoy Saha; Jonas S. Almeida; Arlindo L. Oliveira

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Martim Afonso, Praphulla M. S. Bhawsar, Monjoy Saha, Jonas S. Almeida, Arlindo L. Oliveira

TL;DR

This study tackles the challenge of learning from slide-level labels in Whole Slide Images (WSI) by applying weakly supervised Multiple Instance Learning (MIL) with attention, aiming to identify Regions of Interest (RoIs) and heatmaps that reflect underlying morphologies. It compares Attention MIL (AMIL) and Additive MIL (AdMIL) architectures on two TCGA cohorts (TCGA-BRCA and TCGA-LUSC) for two tasks: tumor detection and TP53 mutation detection, across multiple magnifications. The results show higher accuracy for tumor detection (AUC up to ~0.97) and reveal that higher magnifications improve mutation-detection performance, with AMIL generally delivering stronger RoI signals while AdMIL offers alternative interpretability through per-patch scores. These findings highlight the potential of MIL-based RoI discovery in digital pathology and provide a basis for interactive, morphology-driven exploration, albeit with caveats about mutation-detection reliability and dataset noise, especially at lower magnifications. $Y\\in\\{0,1\\}$ and related MIL formulations underpin the approach, while $X=\\{x_1,...,x_K\\}$ denotes patch-level instances used to predict slide-level outcomes.

Abstract

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

TL;DR

and related MIL formulations underpin the approach, while

denotes patch-level instances used to predict slide-level outcomes.

Abstract

Paper Structure (20 sections, 8 equations, 9 figures, 2 tables)

This paper contains 20 sections, 8 equations, 9 figures, 2 tables.

Introduction
Materials and Methods
Multiple Instance Learning
Model Architectures
Results
Training Methods
Datasets
TCGA-BRCA
TCGA-LUSC
WSI Preprocessing Pipeline
WSI Metadata Extraction
Patch Fetching and Pre-processing
Feature Extraction
Classification and RoI Detection Results
Discussion
...and 5 more sections

Figures (9)

Figure 1: Models' Architectures. The attention layers are composed of fully-connected (FC) layers, followed by activation functions. The AMIL (a) uses a tanh as its activation function, while the AdMIL (b) uses LeakyReLU. At the end of the attention layers, the results for each embedding are passed to a softmax to produce the final attention scores.
Figure 2: Sampling method for 10x and 20x magnifications. K-means clustering is applied to the set of tiles chosen from the previous magnification $m$. N tiles from each cluster are selected and the corresponding tiles at the magnification desired ($m$ + 1) are then fetched.
Figure 3: ROC curves for one of the runs of the Tumor Detection Task (5x magnification)
Figure 4: ROC curves for one of the runs of the TP53 mutation Detection Task (5x magnification)
Figure 5: ROC curves for one of the runs of the TP53 mutation Detection Task (10x magnification)
...and 4 more figures

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

TL;DR

Abstract

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)