Interactive Simulations of Backdoors in Neural Networks

Peter Bajcsy; Maxime Bros

Interactive Simulations of Backdoors in Neural Networks

Peter Bajcsy, Maxime Bros

TL;DR

This work introduces an interactive web-based platform to study cryptographic backdoors in neural networks, focusing on checksum-based backdoors injected into digital signature verification and activation functions. It formalizes a simple checksum $csum(v)$ and demonstrates how backdoors can be triggered via secret keys, while also implementing a proximity-based defense to detect adversarial inputs. The results illustrate both the feasibility and limitations of such backdoors in small-scale networks, as well as practical constraints for web-based interactivity and robustness. The framework serves as an educational and research tool to explore planting, activation, and defense dynamics of cryptographic backdoors in AI systems, with potential implications for model integrity in practice.

Abstract

This work addresses the problem of planting and defending cryptographic-based backdoors in artificial intelligence (AI) models. The motivation comes from our lack of understanding and the implications of using cryptographic techniques for planting undetectable backdoors under theoretical assumptions in the large AI model systems deployed in practice. Our approach is based on designing a web-based simulation playground that enables planting, activating, and defending cryptographic backdoors in neural networks (NN). Simulations of planting and activating backdoors are enabled for two scenarios: in the extension of NN model architecture to support digital signature verification and in the modified architectural block for non-linear operators. Simulations of backdoor defense against backdoors are available based on proximity analysis and provide a playground for a game of planting and defending against backdoors. The simulations are available at https://pages.nist.gov/nn-calculator

Interactive Simulations of Backdoors in Neural Networks

TL;DR

and demonstrates how backdoors can be triggered via secret keys, while also implementing a proximity-based defense to detect adversarial inputs. The results illustrate both the feasibility and limitations of such backdoors in small-scale networks, as well as practical constraints for web-based interactivity and robustness. The framework serves as an educational and research tool to explore planting, activation, and defense dynamics of cryptographic backdoors in AI systems, with potential implications for model integrity in practice.

Abstract

Paper Structure (15 sections, 6 equations, 7 figures, 1 table)

This paper contains 15 sections, 6 equations, 7 figures, 1 table.

Introduction
Methods
Simple Checksum
Backdoor in Checksum-Based Digital Signature Verification
Checksum-Based Backdoor in Activation Functions of Model Nodes
Defense Against Checksum-Based Backdoor
Results
Backdoor in Checksum-Based Digital Signature Verification
Checksum-Based Backdoor in Activation Functions
Defense Against Checksum-Based Backdoor
Discussion
Planting Backdoors
Activation of Backdoors
Defense Against Backdoors
Conclusion

Figures (7)

Figure 1: An overview of planting and activating a backdoor in a trained neural network. The simulation playground enables training a two-class fully-connected neural network (NN) with input features derived from 2D points, planting a checksum-based backdoor in the NN model, and activating the backdoor based on the knowledge of a secret key.
Figure 2: An overview of a neural network with digital signature verification described by Goldwasser et al. goldwasser2022planting. A backdoor is planted into the output linear layer to flip an output label for a checksum (CSUM) of the input value that matches a secret key.
Figure 3: A schematic of computations in each NN node with rectified linear unit (ReLU). The total input of a node is computed for two input features, $x$ and $y$, as shown at the top. The NN first layer with four nodes, two inputs, and the ReLU activation function is shown at the bottom.
Figure 4: An illustration of the impact of a checksum function (top) with the secret key (red line) on the ReLU activation function (bottom). The arrows show the locations where the ReLU function will be affected by the checksum and the output sign will be flipped.
Figure 5: Histogram of checksum values for all test data points (left). Given a secret key (red arrow), all test points with the checksum equal to the secret key will flip their labels (13 labels are flipped for the secret key equal to 113).
...and 2 more figures

Interactive Simulations of Backdoors in Neural Networks

TL;DR

Abstract

Interactive Simulations of Backdoors in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)