SAM Meets UAP: Attacking Segment Anything Model With Universal Adversarial Perturbation

Dongshen Han; Chaoning Zhang; Sheng Zheng; Chang Lu; Yang Yang; Heng Tao Shen

SAM Meets UAP: Attacking Segment Anything Model With Universal Adversarial Perturbation

Dongshen Han, Chaoning Zhang, Sheng Zheng, Chang Lu, Yang Yang, Heng Tao Shen

TL;DR

The study investigates whether Segment Anything Model (SAM) can be attacked with a single universal adversarial perturbation (UAP). It shifts from image-centric to perturbation-centric optimization and adopts a self-supervised contrastive learning framework where the UAP serves as the anchor, a positively augmented version of the UAP acts as the positive sample, and a memory bank stores diverse negatives from the image encoder, optimized via an InfoNCE loss. Empirical results show that augmenting the UAP with natural images yields the strongest universal attack, causing the mean IoU across 100 test images to drop on both point and box prompts, with qualitative results revealing shrinkage or expansion of masks depending on the prompt type. This work demonstrates augmentation-invariant vulnerabilities in SAM and provides actionable insights into how UAPs can be constructed against prompt-guided segmentation models, potentially guiding future defenses.

Abstract

As Segment Anything Model (SAM) becomes a popular foundation model in computer vision, its adversarial robustness has become a concern that cannot be ignored. This works investigates whether it is possible to attack SAM with image-agnostic Universal Adversarial Perturbation (UAP). In other words, we seek a single perturbation that can fool the SAM to predict invalid masks for most (if not all) images. We demonstrate convetional image-centric attack framework is effective for image-independent attacks but fails for universal adversarial attack. To this end, we propose a novel perturbation-centric framework that results in a UAP generation method based on self-supervised contrastive learning (CL), where the UAP is set to the anchor sample and the positive sample is augmented from the UAP. The representations of negative samples are obtained from the image encoder in advance and saved in a memory bank. The effectiveness of our proposed CL-based UAP generation method is validated by both quantitative and qualitative results. On top of the ablation study to understand various components in our proposed method, we shed light on the roles of positive and negative samples in making the generated UAP effective for attacking SAM.

SAM Meets UAP: Attacking Segment Anything Model With Universal Adversarial Perturbation

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 4 figures, 5 tables)

This paper contains 15 sections, 3 equations, 4 figures, 5 tables.

Introduction
Related works
Background and Problem Formulation
Prompt-guided Image Segmentation
Universal Adversarial Attack on SAM
Implementation details.
Method
Existing Image-Centric Attack Framework
Proposed Perturbation-Centric Attack Framework
Experimental Results and Analysis
Towards Finding Effective Augmentation
Qualitative Results
Ablation Study
Discussion
Conclusion

Figures (4)

Figure 1: Difference between image-centric (left) and perturbation-centric (right) attack frameworks.
Figure 2: Qualitative results under point prompts. Column (a) and (b) shows the clean and adversarial images with the point prompt marked in a green star, with their predicted masks shown in column (c) and (d), respectively. The UAP makes the mask invalid by removing it (or making it smaller).
Figure 3: Qualitative results under box prompts. Column (a) and (b) refers to the clean and adversarial images with the box prompt marked with green lines, with their predicted masks shown in column (c) and (d), respectively. The UAP makes the mask invalid by making it larger and blurry.
Figure 4: The mIoU (%) results for different weights of the augmented images.

SAM Meets UAP: Attacking Segment Anything Model With Universal Adversarial Perturbation

TL;DR

Abstract

SAM Meets UAP: Attacking Segment Anything Model With Universal Adversarial Perturbation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)