Table of Contents
Fetching ...

Uncertainty-based Detection of Adversarial Attacks in Semantic Segmentation

Kira Maag, Asja Fischer

TL;DR

This work introduces an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation and demonstrates the ability of this approach to detect perturbed images across multiple types of adversarial attacks.

Abstract

State-of-the-art deep neural networks have proven to be highly powerful in a broad range of tasks, including semantic image segmentation. However, these networks are vulnerable against adversarial attacks, i.e., non-perceptible perturbations added to the input image causing incorrect predictions, which is hazardous in safety-critical applications like automated driving. Adversarial examples and defense strategies are well studied for the image classification task, while there has been limited research in the context of semantic segmentation. First works however show that the segmentation outcome can be severely distorted by adversarial attacks. In this work, we introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation. We observe that uncertainty as for example captured by the entropy of the output distribution behaves differently on clean and perturbed images and leverage this property to distinguish between the two cases. Our method works in a light-weight and post-processing manner, i.e., we do not modify the model or need knowledge of the process used for generating adversarial examples. In a thorough empirical analysis, we demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.

Uncertainty-based Detection of Adversarial Attacks in Semantic Segmentation

TL;DR

This work introduces an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation and demonstrates the ability of this approach to detect perturbed images across multiple types of adversarial attacks.

Abstract

State-of-the-art deep neural networks have proven to be highly powerful in a broad range of tasks, including semantic image segmentation. However, these networks are vulnerable against adversarial attacks, i.e., non-perceptible perturbations added to the input image causing incorrect predictions, which is hazardous in safety-critical applications like automated driving. Adversarial examples and defense strategies are well studied for the image classification task, while there has been limited research in the context of semantic segmentation. First works however show that the segmentation outcome can be severely distorted by adversarial attacks. In this work, we introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation. We observe that uncertainty as for example captured by the entropy of the output distribution behaves differently on clean and perturbed images and leverage this property to distinguish between the two cases. Our method works in a light-weight and post-processing manner, i.e., we do not modify the model or need knowledge of the process used for generating adversarial examples. In a thorough empirical analysis, we demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
Paper Structure (18 sections, 11 equations, 12 figures, 2 tables)

This paper contains 18 sections, 11 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Semantic segmentation prediction and entropy heatmap for clean (left) and perturbed image (right) generated by a dynamic target attack for hiding pedestrians.
  • Figure 2: Schematic illustration of our detection method. The adversarial attacker can have full access to the semantic segmentation model. Information from the network output is extracted to construct the features which serve as input to the detector model classifying between clean and perturbed images.
  • Figure 3: Input image (a) with corresponding ground truth (e). Semantic segmentation prediction for clean (b) and perturbed image generated by an untargeted (c) and a targeted FGSM attack (d) as well as by SSMM (f), DNNM (g) and patch attack (h).
  • Figure 4: APSR results for the Cityscapes dataset and both networks perturbed by different attacks.
  • Figure 5: Detection performance results for the VOC dataset and the DeepLabv3+ network.
  • ...and 7 more figures