Table of Contents
Fetching ...

Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding

Ehsan Ganjidoost, Jeff Orchard

TL;DR

By seamlessly integrating PCnets into feed-forward networks as a preprocessing step, this study substantially bolster resilience to adversarial perturbations and holds promise for enhancing the security and reliability of neural network classifiers in the face of the escalating threat of adversarial attacks.

Abstract

An adversarial example is a modified input image designed to cause a Machine Learning (ML) model to make a mistake; these perturbations are often invisible or subtle to human observers and highlight vulnerabilities in a model's ability to generalize from its training data. Several adversarial attacks can create such examples, each with a different perspective, effectiveness, and perceptibility of changes. Conversely, defending against such adversarial attacks improves the robustness of ML models in image processing and other domains of deep learning. Most defence mechanisms require either a level of model awareness, changes to the model, or access to a comprehensive set of adversarial examples during training, which is impractical. Another option is to use an auxiliary model in a preprocessing manner without changing the primary model. This study presents a practical and effective solution -- using predictive coding networks (PCnets) as an auxiliary step for adversarial defence. By seamlessly integrating PCnets into feed-forward networks as a preprocessing step, we substantially bolster resilience to adversarial perturbations. Our experiments on MNIST and CIFAR10 demonstrate the remarkable effectiveness of PCnets in mitigating adversarial examples with about 82% and 65% improvements in robustness, respectively. The PCnet, trained on a small subset of the dataset, leverages its generative nature to effectively counter adversarial efforts, reverting perturbed images closer to their original forms. This innovative approach holds promise for enhancing the security and reliability of neural network classifiers in the face of the escalating threat of adversarial attacks.

Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding

TL;DR

By seamlessly integrating PCnets into feed-forward networks as a preprocessing step, this study substantially bolster resilience to adversarial perturbations and holds promise for enhancing the security and reliability of neural network classifiers in the face of the escalating threat of adversarial attacks.

Abstract

An adversarial example is a modified input image designed to cause a Machine Learning (ML) model to make a mistake; these perturbations are often invisible or subtle to human observers and highlight vulnerabilities in a model's ability to generalize from its training data. Several adversarial attacks can create such examples, each with a different perspective, effectiveness, and perceptibility of changes. Conversely, defending against such adversarial attacks improves the robustness of ML models in image processing and other domains of deep learning. Most defence mechanisms require either a level of model awareness, changes to the model, or access to a comprehensive set of adversarial examples during training, which is impractical. Another option is to use an auxiliary model in a preprocessing manner without changing the primary model. This study presents a practical and effective solution -- using predictive coding networks (PCnets) as an auxiliary step for adversarial defence. By seamlessly integrating PCnets into feed-forward networks as a preprocessing step, we substantially bolster resilience to adversarial perturbations. Our experiments on MNIST and CIFAR10 demonstrate the remarkable effectiveness of PCnets in mitigating adversarial examples with about 82% and 65% improvements in robustness, respectively. The PCnet, trained on a small subset of the dataset, leverages its generative nature to effectively counter adversarial efforts, reverting perturbed images closer to their original forms. This innovative approach holds promise for enhancing the security and reliability of neural network classifiers in the face of the escalating threat of adversarial attacks.

Paper Structure

This paper contains 12 sections, 4 equations, 24 figures, 4 tables.

Figures (24)

  • Figure 1: FFnet's perception of the image changes as the noise perturbed the image. FFnet perceives the original image $\Pr(y_{0}=1|x)=0.99$ while the perception changed to $\Pr(y_{3}=1|x+\delta)=0.87$ on perturbation.
  • Figure 2: A typical PCnet arranged in a feed-forward manner. Each box represents a population of neurons containing value and error nodes.
  • Figure 3: PCnet perturbation is demonstrated using both the original and adversarial images. PCnet modifies the given input based on its trained dynamics. As shown in \ref{['fig:imageBeforeAfterPC']}, the original image $x$ is depicted on the left, while its perturbation $\mathrm{PCnet}(x)$ is shown on the right. Similarly, \ref{['fig:advBeforeAfterPC']} presents the adversarial image $z$ on the left, alongside its perturbation $p$ on the right.
  • Figure 4: First, the workflow involves dividing the dataset into two parts: $X$ and $\tilde{X}$. After that, we generate Adversarial Examples (AEs) by attacking the FFnet, which can be represented as $\textbf{AT}: X \rightarrow Z$ and $\tilde{X} \rightarrow \tilde{Z}$. We then assess the defence strategy against the attack, first using the AEs directly (i.e., $Z$ and $\tilde{Z}$), and then after making some adjustments to the AEs using the PCnet (i.e., $P$ and $\tilde{P}$).
  • Figure 5: Adversarial attack leads to a successful or failed adversarial example (AE). The attacker may choose a target class so that the FFnet model predicts AE as such, or any attack that causes misclassifications in FFnet is favourable. For the given image of $0$, the attacker aimed for different targets, $8, 1, 3$, from left to right. However, FFnet's prediction might be slightly different from the target. From left to right, the AEs are as follows: failed AE (aimed for $8$ and predicted $0$); non-targeted AE (aimed for $1$ and predicted $3$); targeted AE (aimed for $3$ and predicted $3$). Note that even if an AE is non-targeted, it is still a valid AE. Furthermore, the choice of target depends on the attack method.
  • ...and 19 more figures