Table of Contents
Fetching ...

FSPGD: Rethinking Black-box Attacks on Semantic Segmentation

Eun-Sol Park, MiSo Park, Seung Park, Yong-Goo Shin

TL;DR

The paper tackles the limited transferability of black-box adversarial attacks for semantic segmentation by introducing FSPGD, which leverages gradients from intermediate-layer features rather than outputs. It defines two feature-based losses, $L_{ex}$ and $L_{in}$, and combines them into a dynamic objective $L=\lambda_t L_{ex}+(1-\lambda_t)L_{in}$ to drive local dissimilarity and contextual disruption, with a Gram-matrix-based internal similarity term and a binarized spatial mask. Empirical results on VOC 2012 and Cityscapes show FSPGD achieves state-of-the-art transferability across diverse backbones, including transformer-based architectures, and extensive ablations validate the benefits of middle-layer attacks, dynamic weighting, and the threshold $\tau=\cos(\pi/3)$. The work provides new benchmarks for black-box segmentation attacks and highlights the importance of attacking intermediate representations to improve cross-model robustness and transferability.

Abstract

Transferability, the ability of adversarial examples crafted for one model to deceive other models, is crucial for black-box attacks. Despite advancements in attack methods for semantic segmentation, transferability remains limited, reducing their effectiveness in real-world applications. To address this, we introduce the Feature Similarity Projected Gradient Descent (FSPGD) attack, a novel black-box approach that enhances both attack performance and transferability. Unlike conventional segmentation attacks that rely on output predictions for gradient calculation, FSPGD computes gradients from intermediate layer features. Specifically, our method introduces a loss function that targets local information by comparing features between clean images and adversarial examples, while also disrupting contextual information by accounting for spatial relationships between objects. Experiments on Pascal VOC 2012 and Cityscapes datasets demonstrate that FSPGD achieves superior transferability and attack performance, establishing a new state-of-the-art benchmark. Code is available at https://github.com/KU-AIVS/FSPGD.

FSPGD: Rethinking Black-box Attacks on Semantic Segmentation

TL;DR

The paper tackles the limited transferability of black-box adversarial attacks for semantic segmentation by introducing FSPGD, which leverages gradients from intermediate-layer features rather than outputs. It defines two feature-based losses, and , and combines them into a dynamic objective to drive local dissimilarity and contextual disruption, with a Gram-matrix-based internal similarity term and a binarized spatial mask. Empirical results on VOC 2012 and Cityscapes show FSPGD achieves state-of-the-art transferability across diverse backbones, including transformer-based architectures, and extensive ablations validate the benefits of middle-layer attacks, dynamic weighting, and the threshold . The work provides new benchmarks for black-box segmentation attacks and highlights the importance of attacking intermediate representations to improve cross-model robustness and transferability.

Abstract

Transferability, the ability of adversarial examples crafted for one model to deceive other models, is crucial for black-box attacks. Despite advancements in attack methods for semantic segmentation, transferability remains limited, reducing their effectiveness in real-world applications. To address this, we introduce the Feature Similarity Projected Gradient Descent (FSPGD) attack, a novel black-box approach that enhances both attack performance and transferability. Unlike conventional segmentation attacks that rely on output predictions for gradient calculation, FSPGD computes gradients from intermediate layer features. Specifically, our method introduces a loss function that targets local information by comparing features between clean images and adversarial examples, while also disrupting contextual information by accounting for spatial relationships between objects. Experiments on Pascal VOC 2012 and Cityscapes datasets demonstrate that FSPGD achieves superior transferability and attack performance, establishing a new state-of-the-art benchmark. Code is available at https://github.com/KU-AIVS/FSPGD.

Paper Structure

This paper contains 18 sections, 9 equations, 13 figures, 11 tables, 1 algorithm.

Figures (13)

  • Figure 1: Visualization of the feature similarity. We show a feature similarity map using the features of the bicycle wheels area (red box) as the reference feature. In conventional methods, high feature similarity is observed with other bicycle wheels (yellow and blue boxes), whereas in the proposed method, feature similarity is notably reduced. (a) Clean image, (b) PGD mkadry2017towards (c) SegPGD gu2022segpgd, (d) CosPGD agnihotri2024cospgd, and (e) FSPGD (Ours).
  • Figure 1: Visualization of the feature similarity on Pascal VOC 2012 dataset. Red boxes indicate the reference features, while yellow and blue boxes represent regions belonging to the same class as the red boxes. Deeplabv3-Res50 is used as the source model and Deeplabv3-Res101 is used as target model. (a) Clean image, (b) PGD mkadry2017towards, (c) SegPGD gu2022segpgd, (d) CosPGD agnihotri2024cospgd, (e) FSPGD (Ours).
  • Figure 2: Overall framework of FSPGD. FSPGD employs a loss function with two components: external and internal feature similarity loss. The external feature similarity loss measures similarity between intermediate-level features of the clean image and adversarial example, whereas the internal feature similarity loss compares intermediate-level feature similarity among similar objects within adversarial example.
  • Figure 2: Visualization of the feature similarity on Pascal VOC 2012 dataset. Red boxes indicate the reference features, while yellow and blue boxes represent regions belonging to the same class as the red boxes. Deeplabv3-Res50 is used as the source model and Deeplabv3-Res101 is used as target model. (a) Clean image, (b) PGD mkadry2017towards, (c) SegPGD gu2022segpgd, (d) CosPGD agnihotri2024cospgd, (e) FSPGD (Ours).
  • Figure 3: Visualization of experimental results. DV3Res50 is used as the source model and images of first column are clean images and adversarial examples generated by PGD mkadry2017towards, SegPGD gu2022segpgd, CosPGD agnihotri2024cospgd, and FSPGD (Ours). second column is ground truth of input images. And other columns are predictions of target models.
  • ...and 8 more figures