Table of Contents
Fetching ...

Practical Region-level Attack against Segment Anything Models

Yifan Shen, Zhengyuan Li, Gang Wang

TL;DR

This work addresses the practical robustness of Segment Anything Models (SAM) by introducing region-level adversarial attacks that do not require knowledge of the exact user prompt. It presents Sampling-based Region Attack (S-RA) and Transferable Region Attack (T-RA), with T-RA leveraging Spectrum Transformation to improve black-box transferability across SAM variants. Extensive experiments across ViT-B/H/L backbones and multiple SAM variants (EfficientSAM, Fast-SAM, MobileSAM, HQ-SAM) show that T-RA can drastically reduce segmentation performance (mean IoU $<0.10$ in many black-box settings) and even degrade performance on a real-world SAM service, underscoring practical security concerns. The results motivate defenses such as adversarial training, input transformations, and more robust architectures, and suggest extending region-level attacks to other prompts and segmentation models for future work.

Abstract

Segment Anything Models (SAM) have made significant advancements in image segmentation, allowing users to segment target portions of an image with a single click (i.e., user prompt). Given its broad applications, the robustness of SAM against adversarial attacks is a critical concern. While recent works have explored adversarial attacks against a pre-defined prompt/click, their threat model is not yet realistic: (1) they often assume the user-click position is known to the attacker (point-based attack), and (2) they often operate under a white-box setting with limited transferability. In this paper, we propose a more practical region-level attack where attackers do not need to know the precise user prompt. The attack remains effective as the user clicks on any point on the target object in the image, hiding the object from SAM. Also, by adapting a spectrum transformation method, we make the attack more transferable under a black-box setting. Both control experiments and testing against real-world SAM services confirm its effectiveness.

Practical Region-level Attack against Segment Anything Models

TL;DR

This work addresses the practical robustness of Segment Anything Models (SAM) by introducing region-level adversarial attacks that do not require knowledge of the exact user prompt. It presents Sampling-based Region Attack (S-RA) and Transferable Region Attack (T-RA), with T-RA leveraging Spectrum Transformation to improve black-box transferability across SAM variants. Extensive experiments across ViT-B/H/L backbones and multiple SAM variants (EfficientSAM, Fast-SAM, MobileSAM, HQ-SAM) show that T-RA can drastically reduce segmentation performance (mean IoU in many black-box settings) and even degrade performance on a real-world SAM service, underscoring practical security concerns. The results motivate defenses such as adversarial training, input transformations, and more robust architectures, and suggest extending region-level attacks to other prompts and segmentation models for future work.

Abstract

Segment Anything Models (SAM) have made significant advancements in image segmentation, allowing users to segment target portions of an image with a single click (i.e., user prompt). Given its broad applications, the robustness of SAM against adversarial attacks is a critical concern. While recent works have explored adversarial attacks against a pre-defined prompt/click, their threat model is not yet realistic: (1) they often assume the user-click position is known to the attacker (point-based attack), and (2) they often operate under a white-box setting with limited transferability. In this paper, we propose a more practical region-level attack where attackers do not need to know the precise user prompt. The attack remains effective as the user clicks on any point on the target object in the image, hiding the object from SAM. Also, by adapting a spectrum transformation method, we make the attack more transferable under a black-box setting. Both control experiments and testing against real-world SAM services confirm its effectiveness.
Paper Structure (27 sections, 9 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 9 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Region-level Attack on a Segment Anything Model (SAM). The left image shows the original clean image---objects are well segmented when a user clicks on the object region (user clicks are denoted by green stars). The right image shows the attacked image---the corgi in the yellow box (attack-target region) can no longer be identified by SAM no matter where the user clicks within the box. Note that, the regions outside of the yellow box in the image are not affected by the attack.
  • Figure 2: Image segmentation results under different attack methods on the ViT-B model. The left image is the original clean image. The middle image is attacked by S-RA and the right image is attached by T-RA. The attack strength is $\epsilon=8/255$.
  • Figure 3: Visualization of white-box and black-box attack results (attack strength $\epsilon=$8/255). The first row shows the original clean images segmented using the ViT-B model. The second row shows S-RA attack (white-box) trained on ViT-B model and the segmentation results on the same ViT-B model. The result confirms the effectiveness of S-RA attack under a white-box setting. The third row shows S-RA attack (black-box) trained on ViT-B model and the segmentation results on a different ViT-H model. The result shows the lack of transferability of S-RA under a black-box setting. The fourth row shows T-RA attack (black-box) trained on ViT-B model and the segmentation results on a different ViT-H model. The result shows T-RA transfer well and ViT-H cannot segment correctly under this attack.
  • Figure 4: Visualization of the segmentation results under different $\rho$ values for the T-RA under a black-box setting (trained ViT-B; tested on the ViT-H). The attack strength is fixed as $\epsilon = 4/255$. The first column shows the original clean image's segmentation result. The second column is a zoomed-in view to highlight the mask on the clean image. The subsequent columns display the segmentation results for $\rho$ values of 0.01, 0.1, 0.2, and 0.3, respectively. We find that $\rho = 0.1$ resulting in the most effective degradation of segmentation accuracy.
  • Figure 5: Visualization of adversarial examples with $\epsilon = 8/255$ and $\rho = 0.1$, computed on ViT-B and tested on a real-world SAM service. The blue dot is the test point and the highlighted area is the output mask.