Table of Contents
Fetching ...

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

Zelin Liu, Sicheng Dong, Bocheng Li, Yixuan Yang, Jiacheng Ruan, Chenxu Zhou, Suncheng Xiang

TL;DR

This work addresses the gap that Segment Anything Model (SAM) trained on natural images underperforms in medical segmentation due to domain shift and resource constraints. It introduces BALR-SAM, a boundary-aware, parameter-efficient adaptation that fuses a Complementary Detail Enhancement Network (CDEN), low-rank adapters within Vision Transformer blocks, and a low-rank tensor attention mechanism in the mask decoder. The approach achieves state-of-the-art results on ISIC17, Synapse, and polyp segmentation tasks while updating only a small fraction of SAM's parameters, and it substantially reduces memory usage and inference time. The practical impact is enabling accurate medical image segmentation in resource-limited clinical settings, broadening access to foundation-model-based medical analysis.

Abstract

Vision foundation models like the Segment Anything Model (SAM), pretrained on large-scale natural image datasets, often struggle in medical image segmentation due to a lack of domain-specific adaptation. In clinical practice, fine-tuning such models efficiently for medical downstream tasks with minimal resource demands, while maintaining strong performance, is challenging. To address these issues, we propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging. It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and (3) a low-rank tensor attention mechanism in the mask decoder, cutting memory usage by 75% and boosting inference speed. Experiments on standard medical segmentation datasets show that BALR-SAM, without requiring prompts, outperforms several state-of-the-art (SOTA) methods, including fully fine-tuned MedSAM, while updating just 1.8% (11.7M) of its parameters.

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

TL;DR

This work addresses the gap that Segment Anything Model (SAM) trained on natural images underperforms in medical segmentation due to domain shift and resource constraints. It introduces BALR-SAM, a boundary-aware, parameter-efficient adaptation that fuses a Complementary Detail Enhancement Network (CDEN), low-rank adapters within Vision Transformer blocks, and a low-rank tensor attention mechanism in the mask decoder. The approach achieves state-of-the-art results on ISIC17, Synapse, and polyp segmentation tasks while updating only a small fraction of SAM's parameters, and it substantially reduces memory usage and inference time. The practical impact is enabling accurate medical image segmentation in resource-limited clinical settings, broadening access to foundation-model-based medical analysis.

Abstract

Vision foundation models like the Segment Anything Model (SAM), pretrained on large-scale natural image datasets, often struggle in medical image segmentation due to a lack of domain-specific adaptation. In clinical practice, fine-tuning such models efficiently for medical downstream tasks with minimal resource demands, while maintaining strong performance, is challenging. To address these issues, we propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging. It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and (3) a low-rank tensor attention mechanism in the mask decoder, cutting memory usage by 75% and boosting inference speed. Experiments on standard medical segmentation datasets show that BALR-SAM, without requiring prompts, outperforms several state-of-the-art (SOTA) methods, including fully fine-tuned MedSAM, while updating just 1.8% (11.7M) of its parameters.

Paper Structure

This paper contains 15 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the BALR-SAM. The image passing through the main branch, utilizing the Segment Anything Model’s (SAM) image encoder for initial feature extraction. Simultaneously, a complementary branch employs the Complementary Detail Enhancement Network (CDEN) to generate boundary-sensitive features, which are then fused with the image encoder’s features via element-wise addition. Within the image encoder’s Vision Transformer (ViT) blocks, specially designed low-rank adapters are strategically inserted at key positions—post multi-head self-attention and within the MLP residual path—to refine feature processing efficiently. The combined features subsequently feed into the mask decoder, where we overhaul the attention mechanism using a low-rank tensor decomposition of the query, key, and value matrices. Finally, the mask decoder’s prediction head outputs the predicted mask.
  • Figure 2: The visual comparison results of our method on CVC300, ColonDB and ETIS datasets. GT represents ground truth.
  • Figure 3: The visual comparison results of our method on ISIC17 datasets. GT represents ground truth.