TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

Yang Yu; Chen Xu; Kai Wang

TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

Yang Yu, Chen Xu, Kai Wang

TL;DR

This work addresses the limited downstream performance of the Segment-Anything Model (SAM) by introducing TS-SAM, a unified fine-tuning framework that stays lightweight and task-aware. It adds a Convolutional Side Adapter (CSA) to extract and adapt SAM features, along with a Multi-Scale Refinement Module (MRM) and a two-stage Feature Fusion Decoder (FFD) to preserve fine-grained positional information during segmentation. Across ten public datasets spanning camouflage/object detection, shadow detection, and salient object detection, TS-SAM significantly outperforms SAM-Adapter and SSOM and is competitive with state-of-the-art domain-specific models, especially in COD. The approach demonstrates that careful, parameter-efficient modular design can leverage large pretrained vision models for diverse segmentation tasks with limited training overhead and strong generalization.

Abstract

Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adapter (CSA), which integrates the powerful features from SAM into side network training for comprehensive feature fusion. On the other hand, in line with the characteristics of segmentation tasks, we designed Multi-scale Refinement Module (MRM) and Feature Fusion Decoder (FFD) to keep both the detailed and semantic features. Extensive experiments on ten public datasets from three tasks demonstrate that TS-SAM not only significantly outperforms the recently proposed SAM-Adapter and SSOM, but achieves competitive performance with the SOTA domain-specific models. Our code is available at: https://github.com/maoyangou147/TS-SAM.

TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

TL;DR

Abstract

Paper Structure (12 sections, 10 equations, 3 figures, 4 tables)

This paper contains 12 sections, 10 equations, 3 figures, 4 tables.

Introduction
Proposed Methods
Overall Architecture
Convolutional Side Adapter
Multi-Scale Refine Module
Feature Fusion Decoder
Experiment
Datasets and Implementation
Results
Ablation Study
Conclusion
Acknowledgement

Figures (3)

Figure 1: Comparison of TS-SAM with SAM, SAM-Adapter and the SOTA domain-specific models on some images from COD10K dataset.
Figure 2: (a) Overall architecture of TS-SAM. (b) a Convolutional Side Adapter (CSA) for extracting visual features from the SAM image encoder and adapting them to downstream tasks. (c) a Multi-scale Refinement Module (MRM) for extracting detailed features from images.
Figure 3: Structure of Feature Fusion Decoder (FFD), which injects $F_{mrm}^{1}$ and $F_{mrm}^{2}$ respectively into $F_{csa}$. Rectangular boxes represent feature maps of different scales, while rounded boxes represent different modules.

TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

TL;DR

Abstract

TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)