Table of Contents
Fetching ...

WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images

Hong Liu, Haosen Yang, Paul J. van Diest, Josien P. W. Pluim, Mitko Veta

TL;DR

WSI-SAM is presented, enhancing SAM with precise object segmentation capabilities for histopathology images using multi-resolution patches, while preserving its efficient, prompt-driven design, and zero-shot abilities.

Abstract

The Segment Anything Model (SAM) marks a significant advancement in segmentation models, offering robust zero-shot abilities and dynamic prompting. However, existing medical SAMs are not suitable for the multi-scale nature of whole-slide images (WSIs), restricting their effectiveness. To resolve this drawback, we present WSI-SAM, enhancing SAM with precise object segmentation capabilities for histopathology images using multi-resolution patches, while preserving its efficient, prompt-driven design, and zero-shot abilities. To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters and computational overhead. In particular, we introduce High-Resolution (HR) token, Low-Resolution (LR) token and dual mask decoder. This decoder integrates the original SAM mask decoder with a lightweight fusion module that integrates features at multiple scales. Instead of predicting a mask independently, we integrate HR and LR token at intermediate layer to jointly learn features of the same object across multiple resolutions. Experiments show that our WSI-SAM outperforms state-of-the-art SAM and its variants. In particular, our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task (CAMELYON16 dataset). The code will be available at https://github.com/HongLiuuuuu/WSI-SAM.

WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images

TL;DR

WSI-SAM is presented, enhancing SAM with precise object segmentation capabilities for histopathology images using multi-resolution patches, while preserving its efficient, prompt-driven design, and zero-shot abilities.

Abstract

The Segment Anything Model (SAM) marks a significant advancement in segmentation models, offering robust zero-shot abilities and dynamic prompting. However, existing medical SAMs are not suitable for the multi-scale nature of whole-slide images (WSIs), restricting their effectiveness. To resolve this drawback, we present WSI-SAM, enhancing SAM with precise object segmentation capabilities for histopathology images using multi-resolution patches, while preserving its efficient, prompt-driven design, and zero-shot abilities. To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters and computational overhead. In particular, we introduce High-Resolution (HR) token, Low-Resolution (LR) token and dual mask decoder. This decoder integrates the original SAM mask decoder with a lightweight fusion module that integrates features at multiple scales. Instead of predicting a mask independently, we integrate HR and LR token at intermediate layer to jointly learn features of the same object across multiple resolutions. Experiments show that our WSI-SAM outperforms state-of-the-art SAM and its variants. In particular, our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task (CAMELYON16 dataset). The code will be available at https://github.com/HongLiuuuuu/WSI-SAM.
Paper Structure (21 sections, 1 equation, 5 figures, 4 tables)

This paper contains 21 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison of DCIS segmentation in H&E-stained breast tissue by SAM, MedSAM, and our WSI-SAM. Using the same green box as input prompt on a $10\times$ magnification patch, SAM erroneously does not segment the interior wall of DCIS lesion. This error is compounded by the presence of calcifications and necrosis in the interior of the duct. MedSAM overlooks the ductal region beneath the lumen. Leveraging additional context (right), our WSI-SAM predict more accurate entire DCIS area, despite the intervening background and dark region.
  • Figure 2: WSI-SAM model architecture, which introduces HR and LR Tokens, Dual Mask Decoder and Token Aggregation to SAM for for improving the mask quality in histopathology WSIs.
  • Figure 3: Example of predicted tissue segmentation on DCIS. Yellow boxes indicate the incorrect predictions.
  • Figure 4: Comparison of zero-shot interactive segmentation results using a varying number of input points on the DCIS and CAMELYON16 dataset.
  • Figure 5: Comparison of segmentation mask predictions in DCIS dataset.