Table of Contents
Fetching ...

Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement

Zheng Yuan, Jie Zhang, Yude Wang, Shiguang Shan, Xilin Chen

TL;DR

Patch-based adversarial attacks pose a practical threat to semantic segmentation, exacerbated by attention-driven broad receptive fields that allow a localized patch to influence distant pixels. The authors propose a Robust Attention Mechanism (RAM) that refines attention via Max Attention Suppression (MAS) and Random Attention Dropout (RAD), mitigating patch spread while maintaining performance on clean inputs. Empirical results on ADE20K, VOC2012, and Cityscapes demonstrate substantial robustness gains across CNN and ViT backbones, with up to roughly 20% reductions in mIoU against adversarial targets and durable improvements across multiple attack methods. RAM can also complement adversarial training, offering a lightweight, broadly applicable defense for attention-based semantic segmentation systems.

Abstract

The attention mechanism has been proven effective on various visual tasks in recent years. In the semantic segmentation task, the attention mechanism is applied in various methods, including the case of both Convolution Neural Networks (CNN) and Vision Transformer (ViT) as backbones. However, we observe that the attention mechanism is vulnerable to patch-based adversarial attacks. Through the analysis of the effective receptive field, we attribute it to the fact that the wide receptive field brought by global attention may lead to the spread of the adversarial patch. To address this issue, in this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model, which can notably relieve the vulnerability against patch-based attacks. Compared to the vallina attention mechanism, RAM introduces two novel modules called Max Attention Suppression and Random Attention Dropout, both of which aim to refine the attention matrix and limit the influence of a single adversarial patch on the semantic segmentation results of other positions. Extensive experiments demonstrate the effectiveness of our RAM to improve the robustness of semantic segmentation models against various patch-based attack methods under different attack settings.

Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement

TL;DR

Patch-based adversarial attacks pose a practical threat to semantic segmentation, exacerbated by attention-driven broad receptive fields that allow a localized patch to influence distant pixels. The authors propose a Robust Attention Mechanism (RAM) that refines attention via Max Attention Suppression (MAS) and Random Attention Dropout (RAD), mitigating patch spread while maintaining performance on clean inputs. Empirical results on ADE20K, VOC2012, and Cityscapes demonstrate substantial robustness gains across CNN and ViT backbones, with up to roughly 20% reductions in mIoU against adversarial targets and durable improvements across multiple attack methods. RAM can also complement adversarial training, offering a lightweight, broadly applicable defense for attention-based semantic segmentation systems.

Abstract

The attention mechanism has been proven effective on various visual tasks in recent years. In the semantic segmentation task, the attention mechanism is applied in various methods, including the case of both Convolution Neural Networks (CNN) and Vision Transformer (ViT) as backbones. However, we observe that the attention mechanism is vulnerable to patch-based adversarial attacks. Through the analysis of the effective receptive field, we attribute it to the fact that the wide receptive field brought by global attention may lead to the spread of the adversarial patch. To address this issue, in this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model, which can notably relieve the vulnerability against patch-based attacks. Compared to the vallina attention mechanism, RAM introduces two novel modules called Max Attention Suppression and Random Attention Dropout, both of which aim to refine the attention matrix and limit the influence of a single adversarial patch on the semantic segmentation results of other positions. Extensive experiments demonstrate the effectiveness of our RAM to improve the robustness of semantic segmentation models against various patch-based attack methods under different attack settings.
Paper Structure (40 sections, 26 equations, 5 figures, 16 tables, 1 algorithm)

This paper contains 40 sections, 26 equations, 5 figures, 16 tables, 1 algorithm.

Figures (5)

  • Figure 1: A visualization example of the targeted attack setting of Permute and Strip. Permute refers to the setting that replaces the labels in the ground truth segmentation with labels that do not appear in them one by one. Strip refers to the setting that chooses a strip pattern consisting of randomly selected labels as the target segmentation label.
  • Figure 2: The structure of our proposed Robust Attention Mechanism (RAM), which introduces two novel modules called Max Attention Suppression (MAS) and Random Attention Dropout (RAD).
  • Figure 3: Visualization of attention maps, where the first column shows the clean image, the second column shows the attention map of the clean image, the third and fourth columns show the attention maps of the feature at the adversarial patch location to all locations in the image, generated by the baseline method and our proposed RAM method, respectively. Redder colors indicate greater influence.
  • Figure 4: Visualization of the segmentation results of Nonlocal wang2018nonlocal/R50 he2016deep model with our proposed RAM and the baseline model under the attack setting of Permute.
  • Figure 5: More visualizations about the comparison of segmentation results of different models.

Theorems & Definitions (1)

  • proof