Table of Contents
Fetching ...

AGNES: Abstraction-guided Framework for Deep Neural Networks Security

Akshay Dhonthi, Marcello Eiermann, Ernst Moritz Hahn, Vahid Hashemi

TL;DR

This paper introduces AGNES, a tool to detect backdoors in DNNs for image recognition, and shows that the tool performs better than many state-of-the-art methods for multiple relevant case studies.

Abstract

Deep Neural Networks (DNNs) are becoming widespread, particularly in safety-critical areas. One prominent application is image recognition in autonomous driving, where the correct classification of objects, such as traffic signs, is essential for safe driving. Unfortunately, DNNs are prone to backdoors, meaning that they concentrate on attributes of the image that should be irrelevant for their correct classification. Backdoors are integrated into a DNN during training, either with malicious intent (such as a manipulated training process, because of which a yellow sticker always leads to a traffic sign being recognised as a stop sign) or unintentional (such as a rural background leading to any traffic sign being recognised as animal crossing, because of biased training data). In this paper, we introduce AGNES, a tool to detect backdoors in DNNs for image recognition. We discuss the principle approach on which AGNES is based. Afterwards, we show that our tool performs better than many state-of-the-art methods for multiple relevant case studies.

AGNES: Abstraction-guided Framework for Deep Neural Networks Security

TL;DR

This paper introduces AGNES, a tool to detect backdoors in DNNs for image recognition, and shows that the tool performs better than many state-of-the-art methods for multiple relevant case studies.

Abstract

Deep Neural Networks (DNNs) are becoming widespread, particularly in safety-critical areas. One prominent application is image recognition in autonomous driving, where the correct classification of objects, such as traffic signs, is essential for safe driving. Unfortunately, DNNs are prone to backdoors, meaning that they concentrate on attributes of the image that should be irrelevant for their correct classification. Backdoors are integrated into a DNN during training, either with malicious intent (such as a manipulated training process, because of which a yellow sticker always leads to a traffic sign being recognised as a stop sign) or unintentional (such as a rural background leading to any traffic sign being recognised as animal crossing, because of biased training data). In this paper, we introduce AGNES, a tool to detect backdoors in DNNs for image recognition. We discuss the principle approach on which AGNES is based. Afterwards, we show that our tool performs better than many state-of-the-art methods for multiple relevant case studies.
Paper Structure (13 sections, 6 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: AGNES Framework. The colours on neurons represent various clusters. The neurons with the dark blue outline are cluster representatives of each cluster. The neurons with red outlines undergo stimulation while the rest are skipped. The red neurons in the reverse engineering step are the compromised neurons
  • Figure 2: The left image depicts the DNN structure, and the right image depicts the computation of a single neuron output. We show, as an example, computation for neuron $n_{21}$.
  • Figure 3: Obtaining CR positions for one of the features in a convolutional layer. The red mark represent the CR
  • Figure 4: AproxSM method. Here, $\odot$ is the Hadamard product operation. $v$ is the stimulation value
  • Figure 5: Tool Framework
  • ...and 1 more figures