Table of Contents
Fetching ...

Sequential Attention-based Sampling for Histopathological Analysis

Tarun G, Naman Malpani, Gugan Thoppe, Sridharan Devarajan

TL;DR

SASHA tackles the challenge of diagnosing cancer from gigapixel whole-slide images by integrating sequential reinforcement learning with a hierarchical attention-based MIL framework. It learns to select and zoom into a small subset of informative high-resolution patches (around 10–20%), using HAFED for robust feature distillation and TSU for efficient, similarity-driven state updates, all trained with PPO. Empirically, SASHA matches or exceeds state-of-the-art performance while dramatically reducing memory footprint and inference time, and it provides improved calibration and explainability through targeted patch sampling. This approach offers a scalable, interpretable solution for automated histopathology that preserves diagnostic accuracy with substantially lower resource requirements.

Abstract

Deep neural networks are increasingly applied in automated histopathology. Yet, whole-slide images (WSIs) are often acquired at gigapixel sizes, rendering them computationally infeasible to analyze entirely at high resolution. Diagnostic labels are largely available only at the slide-level, because expert annotation of images at a finer (patch) level is both laborious and expensive. Moreover, regions with diagnostic information typically occupy only a small fraction of the WSI, making it inefficient to examine the entire slide at full resolution. Here, we propose SASHA -- Sequential Attention-based Sampling for Histopathological Analysis -- a deep reinforcement learning approach for efficient analysis of histopathological images. First, SASHA learns informative features with a lightweight hierarchical, attention-based multiple instance learning (MIL) model. Second, SASHA samples intelligently and zooms selectively into a small fraction (10-20\%) of high-resolution patches to achieve reliable diagnoses. We show that SASHA matches state-of-the-art methods that analyze the WSI fully at high resolution, albeit at a fraction of their computational and memory costs. In addition, it significantly outperforms competing, sparse sampling methods. We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features. Model implementation is available at: https://github.com/coglabiisc/SASHA.

Sequential Attention-based Sampling for Histopathological Analysis

TL;DR

SASHA tackles the challenge of diagnosing cancer from gigapixel whole-slide images by integrating sequential reinforcement learning with a hierarchical attention-based MIL framework. It learns to select and zoom into a small subset of informative high-resolution patches (around 10–20%), using HAFED for robust feature distillation and TSU for efficient, similarity-driven state updates, all trained with PPO. Empirically, SASHA matches or exceeds state-of-the-art performance while dramatically reducing memory footprint and inference time, and it provides improved calibration and explainability through targeted patch sampling. This approach offers a scalable, interpretable solution for automated histopathology that preserves diagnostic accuracy with substantially lower resource requirements.

Abstract

Deep neural networks are increasingly applied in automated histopathology. Yet, whole-slide images (WSIs) are often acquired at gigapixel sizes, rendering them computationally infeasible to analyze entirely at high resolution. Diagnostic labels are largely available only at the slide-level, because expert annotation of images at a finer (patch) level is both laborious and expensive. Moreover, regions with diagnostic information typically occupy only a small fraction of the WSI, making it inefficient to examine the entire slide at full resolution. Here, we propose SASHA -- Sequential Attention-based Sampling for Histopathological Analysis -- a deep reinforcement learning approach for efficient analysis of histopathological images. First, SASHA learns informative features with a lightweight hierarchical, attention-based multiple instance learning (MIL) model. Second, SASHA samples intelligently and zooms selectively into a small fraction (10-20\%) of high-resolution patches to achieve reliable diagnoses. We show that SASHA matches state-of-the-art methods that analyze the WSI fully at high resolution, albeit at a fraction of their computational and memory costs. In addition, it significantly outperforms competing, sparse sampling methods. We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features. Model implementation is available at: https://github.com/coglabiisc/SASHA.

Paper Structure

This paper contains 30 sections, 2 equations, 5 figures, 13 tables, 1 algorithm.

Figures (5)

  • Figure 1: SASHA: An attention-augmented deep RL model. Starting with low-resolution WSI features $S_0 \in \mathbb{R}^{N \times d}$, at each time step $t$, the deep RL agent selects a patch $a_t$ (red outline) to zoom in at high-resolution, based on policy $\pi(a_t|S_t)$. High-resolution features $V(a_t) \in \mathbb{R}^{d}$ are extracted and aggregated using a Heirarchical Attention-based Feature Distiller (HAFED, \ref{['sec:fac']}). The states $S_t$ of patches with features similar to $a_t$ are updated with a Targeted State Updater (TSU, \ref{['sec:ssu']}). At the end of each episode, the classifier (\ref{['sec:fac']}) predicts the presence or type of cancer.
  • Figure 2: Components of SASHA. ( a) HAFED. During training, features from each patch in each WSI are extracted and aggregated with a hierarchical, attention-based feature distiller (HAFED). High-resolution input $U \in \mathbb{R}^{N \times k \times d}$, undergoes two levels of attentional filtering, first operating across the $k$ high resolution patches in each low resolution patch $\alpha_1 \in \mathbb{R}^{N \times k}$ and then operating across the $N$ low resolution patches $\alpha_2 \in \mathbb{R}^N$ to produce the final classifier prediction $\hat{Y}$. During inference, only the sampled patch's high-resolution features are analyzed. ( b) TSU. Following this the state of all patches whose features are correlated with the sampled patch are concurrently updated, with a targeted state updater (TSU) (see sections \ref{['sec:fac']} and \ref{['sec:ssu']} for details).
  • Figure 3: ( a-b) Patch selection and update strategy by SASHA's RL agent. Box color corresponds to time fraction in episode. ( c-d, h) Inference time, compressibility and expected calibration error (ECE) for SASHA and HAFED compared to other models. S-10% and S-20% refer to SASHA-0.1 and -0.2 models, respectively. ( e) Tumor fraction in patches sampled by SASHA's RL agent (S) versus non-sampled patches (NS). ( f-g) Top-k attention score overlap fraction and average attention score for patches sampled by SASHA's RL agent versus a random policy (Rnd). In (e-g) left and right pairs of bars reflect 10% and 20% observation budgets, respectively. *** indicates $p < 0.001$.
  • Figure 4: Evaluation of model performance on tumor slides from the test set, categorized into four groups. The first three columns correspond to correctly classified tumor slides (TS) by both SASHA and HAFED across increasing tumor fraction (TF) ranges, while the fourth column represents TS misclassified by SASHA but correctly predicted by HAFED. The x-axis denotes the visited patch fraction, binned with a width of 0.005. For each slide, we compute the average entropy, average tumor class prediction probability, and maximum cumulative tumor patch hit ratio (\ref{['eq:hit_ratio']}) within each bin. These metrics are then averaged across all slides, with solid lines indicating the mean and shaded regions representing the standard deviation.
  • Figure 5: Visualization of the observation path traced by the RL agent for a WSI. The first image shows the tumor-annotated region in the WSI. The second image illustrates the patches selected by the RL agent from the initial to the terminal time steps. The third image presents the similar patches updated by the TSU model.