Table of Contents
Fetching ...

Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis

Kira Maag, Roman Resner, Asja Fischer

TL;DR

This paper addresses the vulnerability of semantic segmentation models to adversarial perturbations in safety-critical settings. It introduces an uncertainty-based detection framework that uses pixel-wise uncertainty measures, including the entropy $E(x)_z$, variation ratio $V(x)_z$, and margin $M(x)_z$, as post-processing features to distinguish benign from attacked inputs, without altering the segmentation model. Across Cityscapes with multiple CNN and transformer architectures and a wide set of attacks, the approach achieves high detection performance (average ADA* around $89.36\%$) and operates efficiently as a post-processing step. The method provides a robust, attack-agnostic baseline for uncertainty-based detection in semantic segmentation with practical implications for real-world deployment.

Abstract

Deep neural networks have demonstrated remarkable effectiveness across a wide range of tasks such as semantic segmentation. Nevertheless, these networks are vulnerable to adversarial attacks that add imperceptible perturbations to the input image, leading to false predictions. This vulnerability is particularly dangerous in safety-critical applications like automated driving. While adversarial examples and defense strategies are well-researched in the context of image classification, there is comparatively less research focused on semantic segmentation. Recently, we have proposed an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation. We observed that uncertainty, as measured by the entropy of the output distribution, behaves differently on clean versus adversely perturbed images, and we utilize this property to differentiate between the two. In this extended version of our work, we conduct a detailed analysis of uncertainty-based detection of adversarial attacks including a diverse set of adversarial attacks and various state-of-the-art neural networks. Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method, which is lightweight and operates as a post-processing step, i.e., no model modifications or knowledge of the adversarial example generation process are required.

Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis

TL;DR

This paper addresses the vulnerability of semantic segmentation models to adversarial perturbations in safety-critical settings. It introduces an uncertainty-based detection framework that uses pixel-wise uncertainty measures, including the entropy , variation ratio , and margin , as post-processing features to distinguish benign from attacked inputs, without altering the segmentation model. Across Cityscapes with multiple CNN and transformer architectures and a wide set of attacks, the approach achieves high detection performance (average ADA* around ) and operates efficiently as a post-processing step. The method provides a robust, attack-agnostic baseline for uncertainty-based detection in semantic segmentation with practical implications for real-world deployment.

Abstract

Deep neural networks have demonstrated remarkable effectiveness across a wide range of tasks such as semantic segmentation. Nevertheless, these networks are vulnerable to adversarial attacks that add imperceptible perturbations to the input image, leading to false predictions. This vulnerability is particularly dangerous in safety-critical applications like automated driving. While adversarial examples and defense strategies are well-researched in the context of image classification, there is comparatively less research focused on semantic segmentation. Recently, we have proposed an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation. We observed that uncertainty, as measured by the entropy of the output distribution, behaves differently on clean versus adversely perturbed images, and we utilize this property to differentiate between the two. In this extended version of our work, we conduct a detailed analysis of uncertainty-based detection of adversarial attacks including a diverse set of adversarial attacks and various state-of-the-art neural networks. Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method, which is lightweight and operates as a post-processing step, i.e., no model modifications or knowledge of the adversarial example generation process are required.
Paper Structure (18 sections, 7 equations, 7 figures, 1 table)

This paper contains 18 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Semantic segmentation prediction and entropy heatmap for a benign image (top row) and adversarial example created by the FGSM attack Goodfellow2015 from that image (bottom row).
  • Figure 2: Schematic illustration of our detection method, based on the figure from visapp24. In the withe box setting, the attacker has full access to the semantic segmentation model. Uncertainty heatmaps are obtained based on the network output and these are used as input (either unfiltered or aggregated over the images) for the detection model classifying between clean and perturbed images.
  • Figure 3: Semantic segmentation prediction for clean (a) and perturbed image generated by different attacks (c)-(l) with corresponding entropy heatmaps.
  • Figure 4: APSR results for the Cityscapes dataset perturbed by various attacks.
  • Figure 5: Detection performance results for the DeepLabv3+ network.
  • ...and 2 more figures