SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification
Heng Fang, Sheng Huang, Wenhao Tang, Luwen Huangfu, Bo Liu
TL;DR
This paper tackles the limitation of traditional MIL for whole-slide image classification, which largely ignores global spatial context among patches. It introduces SAM-MIL, a spatial-contextualaware MIL framework that leverages the Segment Anything Model to extract region-level context and integrates it through SAM-guided group masking, region-based group features, and a consistency loss applied to pseudo-bags. Empirical results on CAMELYON-16 and TCGA-Lung Cancer show state-of-the-art AUROC improvements over strong MIL baselines, confirming the value of explicit spatial context in WSI analysis. The approach also provides a plug-in SAM-based feature extractor and reveals insights into how spatial context can guide attention and aggregation in MIL, offering practical implications for pathology image analysis.
Abstract
Multiple Instance Learning (MIL) represents the predominant framework in Whole Slide Image (WSI) classification, covering aspects such as sub-typing, diagnosis, and beyond. Current MIL models predominantly rely on instance-level features derived from pretrained models such as ResNet. These models segment each WSI into independent patches and extract features from these local patches, leading to a significant loss of global spatial context and restricting the model's focus to merely local features. To address this issue, we propose a novel MIL framework, named SAM-MIL, that emphasizes spatial contextual awareness and explicitly incorporates spatial context by extracting comprehensive, image-level information. The Segment Anything Model (SAM) represents a pioneering visual segmentation foundational model that can capture segmentation features without the need for additional fine-tuning, rendering it an outstanding tool for extracting spatial context directly from raw WSIs. Our approach includes the design of group feature extraction based on spatial context and a SAM-Guided Group Masking strategy to mitigate class imbalance issues. We implement a dynamic mask ratio for different segmentation categories and supplement these with representative group features of categories. Moreover, SAM-MIL divides instances to generate additional pseudo-bags, thereby augmenting the training set, and introduces consistency of spatial context across pseudo-bags to further enhance the model's performance. Experimental results on the CAMELYON-16 and TCGA Lung Cancer datasets demonstrate that our proposed SAM-MIL model outperforms existing mainstream methods in WSIs classification. Our open-source implementation code is is available at https://github.com/FangHeng/SAM-MIL.
