Table of Contents
Fetching ...

PATHS: A Hierarchical Transformer for Efficient Whole Slide Image Analysis

Zak Buzzard, Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik

TL;DR

The proposed Pathology Transformer with Hierarchical Selection (PATHS), a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology, achieves superior performance on slide-level prediction tasks when compared to previous methods, despite processing only a small proportion of the slide.

Abstract

Computational analysis of whole slide images (WSIs) has seen significant research progress in recent years, with applications ranging across important diagnostic and prognostic tasks such as survival or cancer subtype prediction. Many state-of-the-art models process the entire slide - which may be as large as $150,000 \times 150,000$ pixels - as a bag of many patches, the size of which necessitates computationally cheap feature aggregation methods. However, a large proportion of these patches are uninformative, such as those containing only healthy or adipose tissue, adding significant noise and size to the bag. We propose Pathology Transformer with Hierarchical Selection (PATHS), a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology. PATHS is inspired by the cross-magnification manner in which a human pathologist examines a slide, recursively filtering patches at each magnification level to a small subset relevant to the diagnosis. Our method overcomes the complications of processing the entire slide, enabling quadratic self-attention and providing a simple interpretable measure of region importance. We apply PATHS to five datasets of The Cancer Genome Atlas (TCGA), and achieve superior performance on slide-level prediction tasks when compared to previous methods, despite processing only a small proportion of the slide.

PATHS: A Hierarchical Transformer for Efficient Whole Slide Image Analysis

TL;DR

The proposed Pathology Transformer with Hierarchical Selection (PATHS), a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology, achieves superior performance on slide-level prediction tasks when compared to previous methods, despite processing only a small proportion of the slide.

Abstract

Computational analysis of whole slide images (WSIs) has seen significant research progress in recent years, with applications ranging across important diagnostic and prognostic tasks such as survival or cancer subtype prediction. Many state-of-the-art models process the entire slide - which may be as large as pixels - as a bag of many patches, the size of which necessitates computationally cheap feature aggregation methods. However, a large proportion of these patches are uninformative, such as those containing only healthy or adipose tissue, adding significant noise and size to the bag. We propose Pathology Transformer with Hierarchical Selection (PATHS), a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology. PATHS is inspired by the cross-magnification manner in which a human pathologist examines a slide, recursively filtering patches at each magnification level to a small subset relevant to the diagnosis. Our method overcomes the complications of processing the entire slide, enabling quadratic self-attention and providing a simple interpretable measure of region importance. We apply PATHS to five datasets of The Cancer Genome Atlas (TCGA), and achieve superior performance on slide-level prediction tasks when compared to previous methods, despite processing only a small proportion of the slide.

Paper Structure

This paper contains 16 sections, 10 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overview of our novel method, PATHS, which predicts a patient's relative hazard level given a whole slide image using a top-down hierarchical process along the slide's pyramidal structure, mimicking the workflow of a pathologist. The prediction $\hat{y}$ is made as a function of the slide-level features at each hierarchy level, $F^1, \dots, F^n$.
  • Figure 2: Architecture of the contextualisation module, which accounts for the hierarchical context of a patch $X^{m}_{u,v}$. The recurrent units are applied down the hierarchy, forming a tree-shaped RNN. In this example, $m_1=0.625$ and $M=2$.
  • Figure 3: Inference speed, including I/O, patch pre-processing using UNI (which dominates latency), and model inference of PATHS (orange) compared to ABMIL (blue) when applied to a single new WSI. The magnification levels shown correspond to those in our experiments ($m_5=10\times=1\mu\text{m}/\text{pixel}$). As pre-processing dominates latency, the results for ABMIL are very close to those for other full slide baselines. Values were averaged over 50 TCGA-BRCA slides on a high performance A100 workstation, with standard error of the mean shown. The results clearly show the low latency of PATHS compared to methods which process the full slide, even for larger values of $K$.
  • Figure 4: Left: whole slide images from the CAMELYON17 dataset with human-annotated tumours regions marked in blue. Right: visualisation of the patches selected by PATHS across magnifications 0.625x through 10x, and their corresponding importance values. (a) and (b) show strong coverage of the tumorous regions at all magnifications, although (c) shows that PATHS may fail to identify micrometastases in some challenging cases.
  • Figure 5: Number of patches loaded per slide for ABMIL (blue) compared to PATHS (orange) for various values of $K$. Values averaged over 50 slides from TCGA-BRCA, as with \ref{['fig:inference_speed']}. Unlike inference latency, this measure is not hardware dependent, and demonstrates clearly the exponential growth in the number of patches required by traditional MIL approaches compared to the linear number required by PATHS.
  • ...and 2 more figures