Table of Contents
Fetching ...

GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation

Ali Khajegili Mirabadi, Graham Archibald, Amirali Darbandsari, Alberto Contreras-Sanz, Ramin Ebrahim Nakhli, Maryam Asadi, Allen Zhang, C. Blake Gilks, Peter Black, Gang Wang, Hossein Farahani, Ali Bashashati

TL;DR

GRASP tackles cancer subtyping from gigapixel WSIs by encoding multi-magnification information as a fixed-structure graph and applying intra- and inter-magnification interactions via a 3-layer GCN. It introduces a convergence-based pooling mechanism that replaces traditional pooling, enabling intra-magnification aggregation and dynamic magnification consultation. The authors provide theoretical analysis of convergence and empirical evidence across three cancer datasets, showing strong accuracy with far fewer parameters than competing multi-magnification models, and they validate interpretability with expert pathologists. The work suggests a principled, lightweight, interpretable graph-based approach for WSI representation in digital pathology with potential for clinical deployment.

Abstract

Cancer subtyping is one of the most challenging tasks in digital pathology, where Multiple Instance Learning (MIL) by processing gigapixel whole slide images (WSIs) has been in the spotlight of recent research. However, MIL approaches do not take advantage of inter- and intra-magnification information contained in WSIs. In this work, we present GRASP, a novel lightweight graph-structured multi-magnification framework for processing WSIs in digital pathology. Our approach is designed to dynamically emulate the pathologist's behavior in handling WSIs and benefits from the hierarchical structure of WSIs. GRASP, which introduces a convergence-based node aggregation mechanism replacing traditional pooling mechanisms, outperforms state-of-the-art methods by a high margin in terms of balanced accuracy, while being significantly smaller than the closest-performing state-of-the-art models in terms of the number of parameters. Our results show that GRASP is dynamic in finding and consulting with different magnifications for subtyping cancers, is reliable and stable across different hyperparameters, and can generalize when using features from different backbones. The model's behavior has been evaluated by two expert pathologists confirming the interpretability of the model's dynamic. We also provide a theoretical foundation, along with empirical evidence, for our work, explaining how GRASP interacts with different magnifications and nodes in the graph to make predictions. We believe that the strong characteristics yet simple structure of GRASP will encourage the development of interpretable, structure-based designs for WSI representation in digital pathology. Data and code can be found in https://github.com/AIMLab-UBC/GRASP

GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation

TL;DR

GRASP tackles cancer subtyping from gigapixel WSIs by encoding multi-magnification information as a fixed-structure graph and applying intra- and inter-magnification interactions via a 3-layer GCN. It introduces a convergence-based pooling mechanism that replaces traditional pooling, enabling intra-magnification aggregation and dynamic magnification consultation. The authors provide theoretical analysis of convergence and empirical evidence across three cancer datasets, showing strong accuracy with far fewer parameters than competing multi-magnification models, and they validate interpretability with expert pathologists. The work suggests a principled, lightweight, interpretable graph-based approach for WSI representation in digital pathology with potential for clinical deployment.

Abstract

Cancer subtyping is one of the most challenging tasks in digital pathology, where Multiple Instance Learning (MIL) by processing gigapixel whole slide images (WSIs) has been in the spotlight of recent research. However, MIL approaches do not take advantage of inter- and intra-magnification information contained in WSIs. In this work, we present GRASP, a novel lightweight graph-structured multi-magnification framework for processing WSIs in digital pathology. Our approach is designed to dynamically emulate the pathologist's behavior in handling WSIs and benefits from the hierarchical structure of WSIs. GRASP, which introduces a convergence-based node aggregation mechanism replacing traditional pooling mechanisms, outperforms state-of-the-art methods by a high margin in terms of balanced accuracy, while being significantly smaller than the closest-performing state-of-the-art models in terms of the number of parameters. Our results show that GRASP is dynamic in finding and consulting with different magnifications for subtyping cancers, is reliable and stable across different hyperparameters, and can generalize when using features from different backbones. The model's behavior has been evaluated by two expert pathologists confirming the interpretability of the model's dynamic. We also provide a theoretical foundation, along with empirical evidence, for our work, explaining how GRASP interacts with different magnifications and nodes in the graph to make predictions. We believe that the strong characteristics yet simple structure of GRASP will encourage the development of interpretable, structure-based designs for WSI representation in digital pathology. Data and code can be found in https://github.com/AIMLab-UBC/GRASP
Paper Structure (28 sections, 4 theorems, 31 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 28 sections, 4 theorems, 31 equations, 9 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Supposing the graph convolutional layers have $L_2$-bounded weights, and the graph node features at $l=0$ are $L_2$-bounded. Therefore, $\forall i, j \in [1,...,m]$,

Figures (9)

  • Figure 1: A chronological overview of different WSI representation methods and their performance compared to the size of the model.
  • Figure 2: Overview of our workflow beginning with WSIs and outputting slide-level subtype predictions. a) shows the WSI being tiled into patches of varying magnification which are then embedded and assembled into a hierarchical graph. In b), graph representations are fed into a three-layer GCN GCNs and subsequently, a two-layer MLP to predict graph-level (slide-level) subtypes. As shown in the message passing steps in b), nodes in the first GCN layer interact with their immediate neighbors; those in the second GCN layer can interact with their second neighbors; and nodes in the final GCN layer can interact with all nodes in the graph. Then, the inter-magnification convergence causes the nodes within each magnification to converge, which is an intrinsic property of the architecture. In the end, the three converged nodes are passed through an average readout module. This dynamic helps the model to look for important messages in the entire graph, and if a node contains important information, it will be broadcast to all other nodes in the graph. The output of the GCN layers is then averaged by the readout module and passed to the FC layers. (For the sake of illustration, $m=4$ is used to show the structure of GRASP).
  • Figure 3: The histogram of consultations conducted by GRASP with different magnifications. First, this shows GRASP is actively dynamic in terms of capturing information from different magnifications benefiting from its multi-magnification structure. Second, information is distributed differently over magnifications depending on the subtype and slide, and there is no optimal magnification for a subtype. For example, in the Bladder dataset, ' $(5x\&10x\&20x)$' shows that the model needed to consult with all three magnifications for $19.3\%$ and $39.4\%$ of slides for MicroP and UCC, respectively; ' $(5x)$' shows that the model has mostly focused on only $5x$ magnification for $43.2\%$ and $5.6\%$ of slides for MicroP and UCC, respectively. This behavior is similar to pathologists, where they can diagnose massive MicroP tumors with lower magnifications, while they need to consult with higher magnifications to confirm a minuscule mass of MicroP tumors. On the other hand, UCC is hard to diagnose at lower magnifications and requires careful examination with different magnifications due to its morphological complexity, which fits the model behavior in proclivity to highlight more than one magnification for the majority of cases.
  • Figure 4: A case study on the Bladder dataset using KimiaNet features. a) Graph-based visualization: a random case from the subtype MicroP in the test data was selected to visualize its magnification heatmap where we show the absolute gradient in terms of each node. The $5x$ magnification contributes to $66.01\%$ of the whole energy model spent on this slide, meaning GRASP overall emphasizes more on $5x$ on this slide. Patch-based visualization: GRASP highlights patches of the three magnifications of a region of interest. In the second row, highlighted regions show the model has identified those areas as important while paying minimal attention to other regions. As confirmed by an expert pathologist, the model's highlights on the three patches are tumors. The model can thus differentiate MicroP tumors from other tissue textures despite being trained for separating MicroP vs UCC. b) shows a similar case yet on the subtype UCC from a random slide in the test data. In this case, GRASP focuses on both $5x (47.45\%)$ and $10x (33.38\%)$ but is more interested in $5x$. As confirmed by the expert pathologist, the regions highlighted (yellowish areas in the second row) by the model are tumorous neighborhoods. Therefore, GRASP can differentiate UCC tumors from other textures and healthy cells across multiple magnifications.
  • Figure 5: The structure of our hierarchical graph and the relationship between two given nodes $h_i$ and $h_k$ within and across different magnifications.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Corollary 1
  • Lemma 1
  • proof
  • proof
  • remark 1
  • remark 2
  • Corollary 2