Sequential Attention-based Sampling for Histopathological Analysis
Tarun G, Naman Malpani, Gugan Thoppe, Sridharan Devarajan
TL;DR
SASHA tackles the challenge of diagnosing cancer from gigapixel whole-slide images by integrating sequential reinforcement learning with a hierarchical attention-based MIL framework. It learns to select and zoom into a small subset of informative high-resolution patches (around 10–20%), using HAFED for robust feature distillation and TSU for efficient, similarity-driven state updates, all trained with PPO. Empirically, SASHA matches or exceeds state-of-the-art performance while dramatically reducing memory footprint and inference time, and it provides improved calibration and explainability through targeted patch sampling. This approach offers a scalable, interpretable solution for automated histopathology that preserves diagnostic accuracy with substantially lower resource requirements.
Abstract
Deep neural networks are increasingly applied in automated histopathology. Yet, whole-slide images (WSIs) are often acquired at gigapixel sizes, rendering them computationally infeasible to analyze entirely at high resolution. Diagnostic labels are largely available only at the slide-level, because expert annotation of images at a finer (patch) level is both laborious and expensive. Moreover, regions with diagnostic information typically occupy only a small fraction of the WSI, making it inefficient to examine the entire slide at full resolution. Here, we propose SASHA -- Sequential Attention-based Sampling for Histopathological Analysis -- a deep reinforcement learning approach for efficient analysis of histopathological images. First, SASHA learns informative features with a lightweight hierarchical, attention-based multiple instance learning (MIL) model. Second, SASHA samples intelligently and zooms selectively into a small fraction (10-20\%) of high-resolution patches to achieve reliable diagnoses. We show that SASHA matches state-of-the-art methods that analyze the WSI fully at high resolution, albeit at a fraction of their computational and memory costs. In addition, it significantly outperforms competing, sparse sampling methods. We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features. Model implementation is available at: https://github.com/coglabiisc/SASHA.
