AGNES: Abstraction-guided Framework for Deep Neural Networks Security

Akshay Dhonthi; Marcello Eiermann; Ernst Moritz Hahn; Vahid Hashemi

AGNES: Abstraction-guided Framework for Deep Neural Networks Security

Akshay Dhonthi, Marcello Eiermann, Ernst Moritz Hahn, Vahid Hashemi

TL;DR

This paper introduces AGNES, a tool to detect backdoors in DNNs for image recognition, and shows that the tool performs better than many state-of-the-art methods for multiple relevant case studies.

Abstract

Deep Neural Networks (DNNs) are becoming widespread, particularly in safety-critical areas. One prominent application is image recognition in autonomous driving, where the correct classification of objects, such as traffic signs, is essential for safe driving. Unfortunately, DNNs are prone to backdoors, meaning that they concentrate on attributes of the image that should be irrelevant for their correct classification. Backdoors are integrated into a DNN during training, either with malicious intent (such as a manipulated training process, because of which a yellow sticker always leads to a traffic sign being recognised as a stop sign) or unintentional (such as a rural background leading to any traffic sign being recognised as animal crossing, because of biased training data). In this paper, we introduce AGNES, a tool to detect backdoors in DNNs for image recognition. We discuss the principle approach on which AGNES is based. Afterwards, we show that our tool performs better than many state-of-the-art methods for multiple relevant case studies.

AGNES: Abstraction-guided Framework for Deep Neural Networks Security

TL;DR

This paper introduces AGNES, a tool to detect backdoors in DNNs for image recognition, and shows that the tool performs better than many state-of-the-art methods for multiple relevant case studies.

Abstract

Paper Structure (13 sections, 6 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 6 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries
Deep Neural Networks
Abstraction
Stimulation
Methodology
Identifying Cluster Representatives
Stimulation Techniques
Tool Architecture
Experiments
Backdoor identification and runtime reduction
Performance on various triggers
Conclusion

Figures (6)

Figure 1: AGNES Framework. The colours on neurons represent various clusters. The neurons with the dark blue outline are cluster representatives of each cluster. The neurons with red outlines undergo stimulation while the rest are skipped. The red neurons in the reverse engineering step are the compromised neurons
Figure 2: The left image depicts the DNN structure, and the right image depicts the computation of a single neuron output. We show, as an example, computation for neuron $n_{21}$.
Figure 3: Obtaining CR positions for one of the features in a convolutional layer. The red mark represent the CR
Figure 4: AproxSM method. Here, $\odot$ is the Hadamard product operation. $v$ is the stimulation value
Figure 5: Tool Framework
...and 1 more figures

AGNES: Abstraction-guided Framework for Deep Neural Networks Security

TL;DR

Abstract

AGNES: Abstraction-guided Framework for Deep Neural Networks Security

Authors

TL;DR

Abstract

Table of Contents

Figures (6)