Intrinsic Biologically Plausible Adversarial Robustness

Matilde Tristany Farinha; Thomas Ortner; Giorgia Dellaferrera; Benjamin Grewe; Angeliki Pantazi

Intrinsic Biologically Plausible Adversarial Robustness

Matilde Tristany Farinha, Thomas Ortner, Giorgia Dellaferrera, Benjamin Grewe, Angeliki Pantazi

TL;DR

This study asks whether biologically plausible learning, exemplified by PEPITA, yields intrinsic adversarial robustness superior to standard backpropagation (BP). By comparing BP and PEPITA on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 across natural and adversarial regimes, the authors show that PEPITA not only possesses higher intrinsic robustness but also maintains a more favorable natural-vs-adversarial performance trade-off, especially under PGD adversarial training. The findings suggest that alternative feedback pathways, implemented as fixed random projections in PEPITA, can mitigate vulnerability to adversarial perturbations and extend to fast adversarial training scenarios (FGSM-based), with better generalization to stronger attacks. Overall, the work provides empirical evidence that biologically-inspired learning dynamics can inspire more robust neural networks and highlights directions for deeper, theory-backed exploration of feedback mechanisms in adversarial contexts.

Abstract

Artificial Neural Networks (ANNs) trained with Backpropagation (BP) excel in different daily tasks but have a dangerous vulnerability: inputs with small targeted perturbations, also known as adversarial samples, can drastically disrupt their performance. Adversarial training, a technique in which the training dataset is augmented with exemplary adversarial samples, is proven to mitigate this problem but comes at a high computational cost. In contrast to ANNs, humans are not susceptible to misclassifying these same adversarial samples. Thus, one can postulate that biologically-plausible trained ANNs might be more robust against adversarial attacks. In this work, we chose the biologically-plausible learning algorithm Present the Error to Perturb the Input To modulate Activity (PEPITA) as a case study and investigated this question through a comparative analysis with BP-trained ANNs on various computer vision tasks. We observe that PEPITA has a higher intrinsic adversarial robustness and, when adversarially trained, also has a more favorable natural-vs-adversarial performance trade-off. In particular, for the same natural accuracies on the MNIST task, PEPITA's adversarial accuracies decrease on average only by 0.26% while BP's decrease by 8.05%.

Intrinsic Biologically Plausible Adversarial Robustness

TL;DR

Abstract

Paper Structure (11 sections, 5 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 11 sections, 5 equations, 2 figures, 8 tables, 1 algorithm.

Introduction
Background - PEPITA
Results
Model training details
Baseline natural and adversarial performance
PEPITA's higher intrinsic adversarial robustness
PEPITA's advantageous adversarial training
PEPITA's advantageous fast adversarial training
Discussion
Limitations and future work
Conclusion

Figures (2)

Figure 1: Comparison of BP and PEPITA. Schematic of BP's and PEPITA's architectures and learning mechanics with a single hidden layer.
Figure 2: PEPITA's advantageous adversarial training. The results presented here are for BP and PEPITA models trained adversarially with PGD samples on the MNIST task for $5$ different random seeds. (A) Natural-vs-adversarial performance trade-off: the most adversarially robust models were selected for different natural accuracy values. That is, different natural accuracy values distributed between $96\%$ and $98\%$ were chosen, and the models with the closest natural accuracy to these values and best adversarial performance were selected during the hyperparameter selection. Each data point's coordinates stand for the average performances over the $5$ different random seeds, that is, the axes in the plot represent the adversarial and natural average test accuracies across these random seeds. The values reported in the first column (MNIST) of Table \ref{['tab:acc_nat_results']}, which correspond to the models with the best adversarial accuracies, are marked in red. (B) Natural (represented by the full lines) and adversarial (represented by the dashed lines) test accuracies of the models encircled in (A). To demonstrate that the performance of BP does not increase further, we trained both models for twice the amount of epochs. The shaded area represents the standard deviation across the models trained with $5$ different random seeds.

Intrinsic Biologically Plausible Adversarial Robustness

TL;DR

Abstract

Intrinsic Biologically Plausible Adversarial Robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (2)