SU-SAM: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes
Yiran Song, Qianyu Zhou, Xuequan Lu, Zhiwen Shao, Lizhuang Ma
TL;DR
This work addresses SAM's limited performance on specialized data by introducing SU-SAM, a simple unified PEFT framework that tunes SAM without task-specific designs. Adaptation is modeled as learning a residual $\Delta \boldsymbol{h}$ via Adapter/LoRA modules inserted in Transformer blocks, with four variants (Series, Parallel, Mixed, LoRA) to explore insertion strategies. Extensive experiments across nine datasets and six tasks demonstrate competitive or superior accuracy relative to state-of-the-art, and a generalized multi-dataset SU-SAM model shows strong cross-task generalization. The approach is lightweight, data-agnostic, and easily extendable to SAM-based and Transformer-based architectures, offering practical benefits for broad deployment across diverse segmentation problems.
Abstract
Segment anything model (SAM) has demonstrated excellent generalizability in common vision scenarios, yet falling short of the ability to understand specialized data. Recently, several methods have combined parameter-efficient techniques with task-specific designs to fine-tune SAM on particular tasks. However, these methods heavily rely on handcraft, complicated, and task-specific designs, and pre/post-processing to achieve acceptable performances on downstream tasks. As a result, this severely restricts generalizability to other downstream tasks. To address this issue, we present a simple and unified framework, namely SU-SAM, that can easily and efficiently fine-tune the SAM model with parameter-efficient techniques while maintaining excellent generalizability toward various downstream tasks. SU-SAM does not require any task-specific designs and aims to improve the adaptability of SAM-like models significantly toward underperformed scenes. Concretely, we abstract parameter-efficient modules of different methods into basic design elements in our framework. Besides, we propose four variants of SU-SAM, i.e., series, parallel, mixed, and LoRA structures. Comprehensive experiments on nine datasets and six downstream tasks to verify the effectiveness of SU-SAM, including medical image segmentation, camouflage object detection, salient object segmentation, surface defect segmentation, complex object shapes, and shadow masking. Our experimental results demonstrate that SU-SAM achieves competitive or superior accuracy compared to state-of-the-art methods. Furthermore, we provide in-depth analyses highlighting the effectiveness of different parameter-efficient designs within SU-SAM. In addition, we propose a generalized model and benchmark, showcasing SU-SAM's generalizability across all diverse datasets simultaneously.
