Table of Contents
Fetching ...

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Zhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou

TL;DR

H-SAM is introduced: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure, and outperforms state-of-the-art semisupervised models relying on extensive unlabeled training data across various medical datasets.

Abstract

The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training costs and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance. This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism addressing the unbalanced label distribution, enhancing image embedding; 2) A learnable mask cross-attention mechanism spatially modulating the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical priors, facilitating enhanced adaptation for medical image segmentation with limited samples. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM even outperforms state-of-the-art semi-supervised models relying on extensive unlabeled training data across various medical datasets. Our code is available at https://github.com/Cccccczh404/H-SAM.

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

TL;DR

H-SAM is introduced: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure, and outperforms state-of-the-art semisupervised models relying on extensive unlabeled training data across various medical datasets.

Abstract

The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training costs and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance. This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism addressing the unbalanced label distribution, enhancing image embedding; 2) A learnable mask cross-attention mechanism spatially modulating the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical priors, facilitating enhanced adaptation for medical image segmentation with limited samples. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM even outperforms state-of-the-art semi-supervised models relying on extensive unlabeled training data across various medical datasets. Our code is available at https://github.com/Cccccczh404/H-SAM.
Paper Structure (43 sections, 4 equations, 8 figures, 11 tables)

This paper contains 43 sections, 4 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: H-SAM is advantageous in few-shot medical image segmentation. It achieves over 80% in average Dice using only 10% slices for multi-organ segmentation, outperforming existing prompt-free SAM adaptation methods. Without using any unlabeled data at all, it even outperforms state-of-the-art semi-supervised models that use extensive unlabeled training data for prostate segmentation.
  • Figure 2: The H-SAM framework integrates a LoRA-adapted image encoder and a sophisticated 2-stage hierarchical decoder. We finetune the prompt encoder with default embeddings under a prompt-free setting. A key innovation lies in our hierarchical mask decoder, which strategically utilizes predictions from the stage-1 decoder as priors to achieve nuanced segmentation with 2 implementations: Class-Balanced Mask-Guided Self-Attention (CMAttn), and Learnable Mask Cross-Attention. And a hierarchical pixel decoder is employed to complement the enriched object queries derived from the transformer decoder.
  • Figure 3: The illustration of Class-Balanced Mask-Guided Self-Attention (CMAttn) block.
  • Figure 4: The illustration of learnable mask-attention.
  • Figure 5: The qualitative results of H-SAM and other SAM variants, including SAMed, SAM Adapter, and AutoSAM.
  • ...and 3 more figures