SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images

Shuhang Chen; Hangjie Yuan; Pengwei Liu; Hanxue Gu; Tao Feng; Dong Ni

SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images

Shuhang Chen, Hangjie Yuan, Pengwei Liu, Hanxue Gu, Tao Feng, Dong Ni

TL;DR

SAMora tackles the challenge of adapting SAM to medical image segmentation with limited labels by introducing a two-stage framework that first self-supervised-trains three LoRA experts at image, patch, and pixel levels, then fuses them with HL-Attn during prompt-free fine-tuning. The image-level, patch-level, and pixel-level stages utilize SimCLRv2, MAE, and denoising autoencoders, respectively, with continual pre-training to domain medical data, followed by a cross-attention-based hierarchical fusion that freezes the encoder and LoRA weights during fine-tuning. Key contributions include the hierarchical LoRA fusion (HL-Attn), compatibility with SAM variants (e.g., SAM2, SAMed, H-SAM), and state-of-the-art performance on Synapse, LA, and PROMISE12 in both few-shot and fully supervised settings, with a substantial reduction in fine-tuning epochs (notably $r=4$ for LoRA). The approach demonstrates strong practical impact by leveraging abundant unlabeled data to improve medical segmentation while maintaining efficiency, and the released code enables easy adoption across SAM-based pipelines.

Abstract

The Segment Anything Model (SAM) has demonstrated significant potential in medical image segmentation. Yet, its performance is limited when only a small amount of labeled data is available, while there is abundant valuable yet often overlooked hierarchical information in medical data. To address this limitation, we draw inspiration from self-supervised learning and propose SAMora, an innovative framework that captures hierarchical medical knowledge by applying complementary self-supervised learning objectives at the image, patch, and pixel levels. To fully exploit the complementarity of hierarchical knowledge within LoRAs, we introduce HL-Attn, a hierarchical fusion module that integrates multi-scale features while maintaining their distinct characteristics. SAMora is compatible with various SAM variants, including SAM2, SAMed, and H-SAM. Experimental results on the Synapse, LA, and PROMISE12 datasets demonstrate that SAMora outperforms existing SAM variants. It achieves state-of-the-art performance in both few-shot and fully supervised settings while reducing fine-tuning epochs by 90%. The code is available at https://github.com/ShChen233/SAMora.

SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images

TL;DR

Abstract

SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)