Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Shixuan Gao; Pingping Zhang; Tianyu Yan; Huchuan Lu

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Shixuan Gao, Pingping Zhang, Tianyu Yan, Huchuan Lu

TL;DR

This work tackles the challenge of applying the Segment Anything Model (SAM) to Salient Object Detection (SOD) by addressing prompts dependency and missing fine-grained detail. It introduces Multi-Scale and Detail-Enhanced SAM (MDSAM), which integrates a Lightweight Multi-Scale Adapter (LMSA) to learn multi-scale representations with few parameters, a Multi-Level Fusion Module (MLFM) to fuse multi-level encoder features, and a Detail Enhancement Module (DEM) with a Multi-scale Edge Enhancement Module (MEEM) to recover fine edges. The approach reuses SAM weights while achieving strong SOD performance and broad generalization, including dog- or CAMO-like segmentation tasks and polyp segmentation, supported by extensive experiments on benchmark datasets and COD/polyp generalization settings. The authors provide a practical, efficient SAM-based framework for high-quality SOD and related segmentation tasks, with code released for reproducibility. The reported improvements are reflected in metrics such as $MAE$, $F^{max}_\beta$, $S_m$, and $E_m$ across datasets, underscoring the method's effectiveness and generalization.

Abstract

Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

TL;DR

, and

across datasets, underscoring the method's effectiveness and generalization.

Abstract

Paper Structure (20 sections, 24 equations, 19 figures, 9 tables)

This paper contains 20 sections, 24 equations, 19 figures, 9 tables.

Introduction
Related Work
Salient Object Detection
Segment Anything Model
Our Proposed Method
Lightweight Multi-Scale Adapter
Multi-Level Fusion Module
Detail Enhancement Module
Loss Functions
Experiments
Experiment Settings
Comparison with the State-of-the-arts
Ablation Studies
Conclusion
Appendix
...and 5 more sections

Figures (19)

Figure 1: Illustration of our motivations. The mask generation process of (a) SAM with grid prompts, (b) SAM with selected prompts (points or boxes) and (c) our MDSAM without prompts. (d) and (f) are saliency maps predicted by SAM with Adapter. (e) and (g) are saliency maps predicted by our MDSAM. Our MDSAM locate salient objects more accurately and segment them with fine-grained details.
Figure 2: Overall architecture of the proposed MDSAM. It reuses the pre-trained weights of SAM with three novel modules: Lightweight Multi-Scale Adapter (LMSA), Multi-Level Fusion Module (MLFM) and Detail Enhancement Module (DEM). In addition, Weight Distributors (WD) and Multi-scale Edge Enhancement Module (MEEM) are also introduced to improve the feature representation ability.
Figure 3: Details of the proposed LMSA.
Figure 4: Architecture of the proposed MLFM.
Figure 5: Illustration of the proposed DEM.
...and 14 more figures

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

TL;DR

Abstract

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (19)