Table of Contents
Fetching ...

DarkSAM: Fooling Segment Anything Model to Segment Nothing

Ziqi Zhou, Yufei Song, Minghui Li, Shengshan Hu, Xianlong Wang, Leo Yu Zhang, Dezhong Yao, Hai Jin

TL;DR

This work identifies a critical vulnerability in the Segment Anything Model by introducing DarkSAM, the first truly prompt-free universal adversarial perturbation attacking SAM across prompts and images. It combines a semantic blueprint–driven shadow target strategy with a hybrid spatial-frequency attack to disrupt both foreground and background features and texture information, achieving high attack success and cross-model transferability. The approach demonstrates strong performance on multiple datasets and against several SAM variants, highlighting the need for defenses against prompt-free UAPs in foundation vision models. The findings have significant implications for the security of SAM-based pipelines and underscore the value of robust defenses that address both spatial semantics and texture cues.

Abstract

Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks. Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversarial perturbation (UAP) have not been thoroughly investigated yet. In this paper, we propose DarkSAM, the first prompt-free universal attack framework against SAM, including a semantic decoupling-based spatial attack and a texture distortion-based frequency attack. We first divide the output of SAM into foreground and background. Then, we design a shadow target strategy to obtain the semantic blueprint of the image as the attack target. DarkSAM is dedicated to fooling SAM by extracting and destroying crucial object features from images in both spatial and frequency domains. In the spatial domain, we disrupt the semantics of both the foreground and background in the image to confuse SAM. In the frequency domain, we further enhance the attack effectiveness by distorting the high-frequency components (i.e., texture information) of the image. Consequently, with a single UAP, DarkSAM renders SAM incapable of segmenting objects across diverse images with varying prompts. Experimental results on four datasets for SAM and its two variant models demonstrate the powerful attack capability and transferability of DarkSAM.

DarkSAM: Fooling Segment Anything Model to Segment Nothing

TL;DR

This work identifies a critical vulnerability in the Segment Anything Model by introducing DarkSAM, the first truly prompt-free universal adversarial perturbation attacking SAM across prompts and images. It combines a semantic blueprint–driven shadow target strategy with a hybrid spatial-frequency attack to disrupt both foreground and background features and texture information, achieving high attack success and cross-model transferability. The approach demonstrates strong performance on multiple datasets and against several SAM variants, highlighting the need for defenses against prompt-free UAPs in foundation vision models. The findings have significant implications for the security of SAM-based pipelines and underscore the value of robust defenses that address both spatial semantics and texture cues.

Abstract

Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks. Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversarial perturbation (UAP) have not been thoroughly investigated yet. In this paper, we propose DarkSAM, the first prompt-free universal attack framework against SAM, including a semantic decoupling-based spatial attack and a texture distortion-based frequency attack. We first divide the output of SAM into foreground and background. Then, we design a shadow target strategy to obtain the semantic blueprint of the image as the attack target. DarkSAM is dedicated to fooling SAM by extracting and destroying crucial object features from images in both spatial and frequency domains. In the spatial domain, we disrupt the semantics of both the foreground and background in the image to confuse SAM. In the frequency domain, we further enhance the attack effectiveness by distorting the high-frequency components (i.e., texture information) of the image. Consequently, with a single UAP, DarkSAM renders SAM incapable of segmenting objects across diverse images with varying prompts. Experimental results on four datasets for SAM and its two variant models demonstrate the powerful attack capability and transferability of DarkSAM.
Paper Structure (25 sections, 10 equations, 19 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 10 equations, 19 figures, 6 tables, 1 algorithm.

Figures (19)

  • Figure 1: Illustration of fooling SAM using a UAP
  • Figure 2: Illustration of the proposed shadow target strategy
  • Figure 3: The framework of DarkSAM
  • Figure 4: The ASR (%) of transferability study. (a) explores the impact of the frequency attack on boosting the cross-domain transferability of UAPs. (b) - (e) stand the results of cross-model transferability study. "Point-HQ" and "Box-HQ" denote the results of HQ-SAM under point and box prompts, while the suffix "-PER" represents the corresponding results for PerSAM. Each row represents the same UAP.
  • Figure 5: Visualizations of SAM segmentation results for adversarial examples across four datasets. The first four columns and the middle four columns display the segmentation results for point and box prompts, respectively. The last three columns show results under the segment everything mode for benign examples, as well as adversarial examples created using point and box prompts, respectively.
  • ...and 14 more figures