PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification
Sharon Peled, Yosef E. Maruvka, Moti Freiman
TL;DR
PSA-MIL tackles the challenge of integrating spatial context into whole-slide image MIL by reframing self-attention as a probabilistic posterior with learnable spatial priors. It introduces dynamic, distance-decay priors, a diversity loss to diversify attention heads, and a spatial pruning mechanism to dramatically reduce computation while preserving spatial dependencies. Empirical results across cancer subtyping, metastatic detection, and survival prediction show state-of-the-art performance with lower computational cost, underscoring the practical impact of data-driven spatial modeling in WSI analysis. This approach provides a principled, adaptable framework for capturing complex tissue structures, with potential for broader clinical deployment.
Abstract
Whole Slide Images (WSIs) are high-resolution digital scans widely used in medical diagnostics. WSI classification is typically approached using Multiple Instance Learning (MIL), where the slide is partitioned into tiles treated as interconnected instances. While attention-based MIL methods aim to identify the most informative tiles, they often fail to fully exploit the spatial relationships among them, potentially overlooking intricate tissue structures crucial for accurate diagnosis. To address this limitation, we propose Probabilistic Spatial Attention MIL (PSA-MIL), a novel attention-based MIL framework that integrates spatial context into the attention mechanism through learnable distance-decayed priors, formulated within a probabilistic interpretation of self-attention as a posterior distribution. This formulation enables a dynamic inference of spatial relationships during training, eliminating the need for predefined assumptions often imposed by previous approaches. Additionally, we suggest a spatial pruning strategy for the posterior, effectively reducing self-attention's quadratic complexity. To further enhance spatial modeling, we introduce a diversity loss that encourages variation among attention heads, ensuring each captures distinct spatial representations. Together, PSA-MIL enables a more data-driven and adaptive integration of spatial context, moving beyond predefined constraints. We achieve state-of-the-art performance across both contextual and non-contextual baselines, while significantly reducing computational costs.
