Table of Contents
Fetching ...

PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation

Yiheng Zhong, Zihong Luo, Chengzhi Liu, Feilong Tang, Zelin Peng, Ming Hu, Yingzhen Hu, Jionglong Su, Zongyuan Ge, Imran Razzak

TL;DR

This work tackles SAM's limited accuracy in medical image segmentation due to domain gaps and noisy priors. It introduces PG-SAM, a three-fold approach comprising a Fine-Grained Modality Prior Aligner (FGMPA) that leverages medical LLMs and LoRA-fine-tuned CLIP, a Multi-level Feature Fusion (MLFF) module to integrate global semantics with local details, and an Iterative Mask Optimizer (IMO) for instance-specific mask refinement. The paper also presents a unified, prompt-free pipeline that enriches priors with medical expertise and enforces precise boundary delineation. On the Synapse dataset, PG-SAM achieves state-of-the-art results, delivering superior segmentation accuracy and boundary quality, which has practical implications for reliable multi-organ delineation in clinical workflows.

Abstract

Segment Anything Model (SAM) demonstrates powerful zero-shot capabilities; however, its accuracy and robustness significantly decrease when applied to medical image segmentation. Existing methods address this issue through modality fusion, integrating textual and image information to provide more detailed priors. In this study, we argue that the granularity of text and the domain gap affect the accuracy of the priors. Furthermore, the discrepancy between high-level abstract semantics and pixel-level boundary details in images can introduce noise into the fusion process. To address this, we propose Prior-Guided SAM (PG-SAM), which employs a fine-grained modality prior aligner to leverage specialized medical knowledge for better modality alignment. The core of our method lies in efficiently addressing the domain gap with fine-grained text from a medical LLM. Meanwhile, it also enhances the priors' quality after modality alignment, ensuring more accurate segmentation. In addition, our decoder enhances the model's expressive capabilities through multi-level feature fusion and iterative mask optimizer operations, supporting unprompted learning. We also propose a unified pipeline that effectively supplies high-quality semantic information to SAM. Extensive experiments on the Synapse dataset demonstrate that the proposed PG-SAM achieves state-of-the-art performance. Our code is released at https://github.com/logan-0623/PG-SAM.

PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation

TL;DR

This work tackles SAM's limited accuracy in medical image segmentation due to domain gaps and noisy priors. It introduces PG-SAM, a three-fold approach comprising a Fine-Grained Modality Prior Aligner (FGMPA) that leverages medical LLMs and LoRA-fine-tuned CLIP, a Multi-level Feature Fusion (MLFF) module to integrate global semantics with local details, and an Iterative Mask Optimizer (IMO) for instance-specific mask refinement. The paper also presents a unified, prompt-free pipeline that enriches priors with medical expertise and enforces precise boundary delineation. On the Synapse dataset, PG-SAM achieves state-of-the-art results, delivering superior segmentation accuracy and boundary quality, which has practical implications for reliable multi-organ delineation in clinical workflows.

Abstract

Segment Anything Model (SAM) demonstrates powerful zero-shot capabilities; however, its accuracy and robustness significantly decrease when applied to medical image segmentation. Existing methods address this issue through modality fusion, integrating textual and image information to provide more detailed priors. In this study, we argue that the granularity of text and the domain gap affect the accuracy of the priors. Furthermore, the discrepancy between high-level abstract semantics and pixel-level boundary details in images can introduce noise into the fusion process. To address this, we propose Prior-Guided SAM (PG-SAM), which employs a fine-grained modality prior aligner to leverage specialized medical knowledge for better modality alignment. The core of our method lies in efficiently addressing the domain gap with fine-grained text from a medical LLM. Meanwhile, it also enhances the priors' quality after modality alignment, ensuring more accurate segmentation. In addition, our decoder enhances the model's expressive capabilities through multi-level feature fusion and iterative mask optimizer operations, supporting unprompted learning. We also propose a unified pipeline that effectively supplies high-quality semantic information to SAM. Extensive experiments on the Synapse dataset demonstrate that the proposed PG-SAM achieves state-of-the-art performance. Our code is released at https://github.com/logan-0623/PG-SAM.

Paper Structure

This paper contains 12 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison of PG-SAM with other methods: (a) Issues with text granularity. (b) Fine-grained explicit text relies on manual verification and suffers from alignment problems. (c) Faces text granularity issues and lacks the zero-shot capabilities of VLM. (d) Our pipeline improves modality alignment through VLM fine-tuning, providing fine-grained text and the most comprehensive improves.
  • Figure 2: Overview of PG-SAM. (a) Illustrates the process by which the fine-grained modality prior aligner generates the Semantic Guide Matrix $G$; (b) For multi-level feature fusion, $G$ is integrated with the feature map after multi-level sampling to preserve more detailed features; (c) It outlines the iterative mask optimizer, which dynamically learns convolution kernel parameters via a Hypernetwork and refines the final mask using a dedicated refiner.
  • Figure 3: Comparison of textual prompts and corresponding heatmaps generated by the Base LLM (left) and the Medical LLM (right) for anatomical images of the spleen, stomach, and pancreas. The Medical LLM provides clinically precise descriptions, yielding more focused and detailed semantic guidance, as demonstrated by the sharper heatmap regions.
  • Figure 4: (a) Shows one of the focused areas in our semantic guide matrix; (b) Displays the visualization of segmentation results from various methods on the Synapse dataset especially focus on gallbladder.
  • Figure 5: Comparing HD95 scores against trainable parameters for different SAMs, with GAP defined as percentage change.