Table of Contents
Fetching ...

Dual Attention Model with Reinforcement Learning for Classification of Histology Whole-Slide Images

Manahil Raza, Ruqayya Awan, Raja Muhammad Saad Bashir, Talha Qaiser, Nasir M. Rajpoot

TL;DR

A novel dual attention approach, consisting of two main components, both inspired by the visual examination process of a pathologist, that achieves performance better than or comparable to the state-of-the-art methods while processing less than 10% of the WSI at the highest magnification and reducing the time required to infer the WSI-level label by more than 75%.

Abstract

Digital whole slide images (WSIs) are generally captured at microscopic resolution and encompass extensive spatial data. Directly feeding these images to deep learning models is computationally intractable due to memory constraints, while downsampling the WSIs risks incurring information loss. Alternatively, splitting the WSIs into smaller patches may result in a loss of important contextual information. In this paper, we propose a novel dual attention approach, consisting of two main components, both inspired by the visual examination process of a pathologist: The first soft attention model processes a low magnification view of the WSI to identify relevant regions of interest, followed by a custom sampling method to extract diverse and spatially distinct image tiles from the selected ROIs. The second component, the hard attention classification model further extracts a sequence of multi-resolution glimpses from each tile for classification. Since hard attention is non-differentiable, we train this component using reinforcement learning to predict the location of the glimpses. This approach allows the model to focus on essential regions instead of processing the entire tile, thereby aligning with a pathologist's way of diagnosis. The two components are trained in an end-to-end fashion using a joint loss function to demonstrate the efficacy of the model. The proposed model was evaluated on two WSI-level classification problems: Human epidermal growth factor receptor 2 scoring on breast cancer histology images and prediction of Intact/Loss status of two Mismatch Repair biomarkers from colorectal cancer histology images. We show that the proposed model achieves performance better than or comparable to the state-of-the-art methods while processing less than 10% of the WSI at the highest magnification and reducing the time required to infer the WSI-level label by more than 75%.

Dual Attention Model with Reinforcement Learning for Classification of Histology Whole-Slide Images

TL;DR

A novel dual attention approach, consisting of two main components, both inspired by the visual examination process of a pathologist, that achieves performance better than or comparable to the state-of-the-art methods while processing less than 10% of the WSI at the highest magnification and reducing the time required to infer the WSI-level label by more than 75%.

Abstract

Digital whole slide images (WSIs) are generally captured at microscopic resolution and encompass extensive spatial data. Directly feeding these images to deep learning models is computationally intractable due to memory constraints, while downsampling the WSIs risks incurring information loss. Alternatively, splitting the WSIs into smaller patches may result in a loss of important contextual information. In this paper, we propose a novel dual attention approach, consisting of two main components, both inspired by the visual examination process of a pathologist: The first soft attention model processes a low magnification view of the WSI to identify relevant regions of interest, followed by a custom sampling method to extract diverse and spatially distinct image tiles from the selected ROIs. The second component, the hard attention classification model further extracts a sequence of multi-resolution glimpses from each tile for classification. Since hard attention is non-differentiable, we train this component using reinforcement learning to predict the location of the glimpses. This approach allows the model to focus on essential regions instead of processing the entire tile, thereby aligning with a pathologist's way of diagnosis. The two components are trained in an end-to-end fashion using a joint loss function to demonstrate the efficacy of the model. The proposed model was evaluated on two WSI-level classification problems: Human epidermal growth factor receptor 2 scoring on breast cancer histology images and prediction of Intact/Loss status of two Mismatch Repair biomarkers from colorectal cancer histology images. We show that the proposed model achieves performance better than or comparable to the state-of-the-art methods while processing less than 10% of the WSI at the highest magnification and reducing the time required to infer the WSI-level label by more than 75%.
Paper Structure (27 sections, 13 equations, 8 figures, 6 tables)

This paper contains 27 sections, 13 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Examples of regions of interest from breast cancer whole slide images with their respective HER2 Scores from the training dataset.
  • Figure 2: Examples of regions of interest from colorectal cancer whole slide images: a) Loss status of MLH1, b) Intact status of MLH1, c) Loss status of PMS2, d) Intact status of PMS2.
  • Figure 3: An overall concept of the proposed dual attention method; a) Depicts the downsampled WSIs $I_{0} \in R ^{ h \times w \times C}$ passed as input to the soft attention module which produces the attention maps. The attention sampling method refines the attention maps and extracts image tiles of size $2048 \times 2048$ at $40 \times$ from the locations $L_{F}$ depicted in the diagram; b) Each extracted image tile is passed to the hard attention module which extracts $T$ multi resolution glimpses from the image tile at 20$\times$ and 40$\times$. Information from all the glimpses is processed to come to assign a score to the image tile. Additionally, a downsampled version of the image tile is used to provide contextual information in selecting the location of the glimpses.
  • Figure 4: (left to right) The first column shows the original MLH1 WSI. The remaining columns show the evolution of the attention maps along different epochs as training progresses for the given slide.
  • Figure 5: Examples of image tiles of HER2 scores 2+ (top row) and 3+ (bottom row). The colored circles represent the locations of the six glimpses used for the prediction of the HER2 score for the respective tiles. The remaining images show the glimpses extracted at the circled locations at 40$\times$ and 20$\times$.
  • ...and 3 more figures