TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks
Yang Yu, Chen Xu, Kai Wang
TL;DR
This work addresses the limited downstream performance of the Segment-Anything Model (SAM) by introducing TS-SAM, a unified fine-tuning framework that stays lightweight and task-aware. It adds a Convolutional Side Adapter (CSA) to extract and adapt SAM features, along with a Multi-Scale Refinement Module (MRM) and a two-stage Feature Fusion Decoder (FFD) to preserve fine-grained positional information during segmentation. Across ten public datasets spanning camouflage/object detection, shadow detection, and salient object detection, TS-SAM significantly outperforms SAM-Adapter and SSOM and is competitive with state-of-the-art domain-specific models, especially in COD. The approach demonstrates that careful, parameter-efficient modular design can leverage large pretrained vision models for diverse segmentation tasks with limited training overhead and strong generalization.
Abstract
Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adapter (CSA), which integrates the powerful features from SAM into side network training for comprehensive feature fusion. On the other hand, in line with the characteristics of segmentation tasks, we designed Multi-scale Refinement Module (MRM) and Feature Fusion Decoder (FFD) to keep both the detailed and semantic features. Extensive experiments on ten public datasets from three tasks demonstrate that TS-SAM not only significantly outperforms the recently proposed SAM-Adapter and SSOM, but achieves competitive performance with the SOTA domain-specific models. Our code is available at: https://github.com/maoyangou147/TS-SAM.
