ASPS: Augmented Segment Anything Model for Polyp Segmentation
Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han
TL;DR
This work addresses the domain gap of the Segment Anything Model (SAM) for polyp segmentation in endoscopy by introducing Augmented SAM for Polyp Segmentation (ASPS). ASPS combines Cross-branch Feature Augmentation (CFA), which fuses a trainable CNN encoder with the frozen ViT encoder via cross-branch attention and replaces position embeddings to better capture local details, and Uncertainty-guided Prediction Regularization (UPR), which tunes normalization and uses IoU-based hints to calibrate confidence and reduce uncertainty. The training objective blends a segmentation loss $L_s = L_{ce} + 0.5 L_{dice} + L_{mse}$ with a confidence loss $L_c = -\log(c)$, yielding $\mathcal{L} = L_s + \lambda L_c$, where the image- and pixel-level confidences satisfy $c = \tfrac{1}{2}(c_i + c_p)$ and $c_p = 1 - \frac{1}{H\times W}\sum_{i=1}^H\sum_{j=1}^W U_p$ with $U_p = 1 - \sigma(|\mathbf{P}|)$. Evaluations on five polyp datasets show that ASPS delivers significant gains over SAM-based methods, achieving higher Dice and IoU on several datasets while operating without prompts; code is released for public use. The combined CFA and UPR approach demonstrates strong domain generalization and practical potential for clinical polyp segmentation.
Abstract
Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performance in polyp segmentation. Firstly, its Transformer-based structure prioritizes global and low-frequency information, potentially overlooking local details, and introducing bias into the learned features. Secondly, when applied to endoscopy images, its poor out-of-distribution (OOD) performance results in substandard predictions and biased confidence output. To tackle these challenges, we introduce a novel approach named Augmented SAM for Polyp Segmentation (ASPS), equipped with two modules: Cross-branch Feature Augmentation (CFA) and Uncertainty-guided Prediction Regularization (UPR). CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge while enhancing local features and high-frequency details. Moreover, UPR ingeniously leverages SAM's IoU score to mitigate uncertainty during the training procedure, thereby improving OOD performance and domain generalization. Extensive experimental results demonstrate the effectiveness and utility of the proposed method in improving SAM's performance in polyp segmentation. Our code is available at https://github.com/HuiqianLi/ASPS.
