Exploring Biologically Inspired Mechanisms of Adversarial Robustness

Konstantin Holzhausen; Mia Merlid; Håkon Olav Torvik; Anders Malthe-Sørenssen; Mikkel Elle Lepperød

Exploring Biologically Inspired Mechanisms of Adversarial Robustness

Konstantin Holzhausen, Mia Merlid, Håkon Olav Torvik, Anders Malthe-Sørenssen, Mikkel Elle Lepperød

TL;DR

This paper investigates robustness gaps in backpropagation-trained neural networks through biologically inspired local learning that yields smooth encodings. By implementing a Krotov–Hopfield two-layer model with winner-take-all dynamics and analyzing latent covariance spectra, the authors reveal a near power-law decay $\lambda_n \approx \lambda_1 n^{-\alpha}$ in the region $n<800$, linking spectrum, geometry, and robustness. Regularization and pruning strategies modulate spectral properties and robustness, showing that a mechanistic, locally learning model can achieve improved resilience while maintaining expressivity, and suggesting power-law spectra as a hallmark of robust representations in both biological and artificial systems. These findings offer a principled path toward trustworthy AI and deepen understanding of how robust neural networks may be realized in mammalian brains.

Abstract

Backpropagation-optimized artificial neural networks, while precise, lack robustness, leading to unforeseen behaviors that affect their safety. Biological neural systems do solve some of these issues already. Unlike artificial models, biological neurons adjust connectivity based on neighboring cell activity. Understanding the biological mechanisms of robustness can pave the way towards building trust worthy and safe systems. Robustness in neural representations is hypothesized to correlate with the smoothness of the encoding manifold. Recent work suggests power law covariance spectra, which were observed studying the primary visual cortex of mice, to be indicative of a balanced trade-off between accuracy and robustness in representations. Here, we show that unsupervised local learning models with winner takes all dynamics learn such power law representations, providing upcoming studies a mechanistic model with that characteristic. Our research aims to understand the interplay between geometry, spectral properties, robustness, and expressivity in neural representations. Hence, we study the link between representation smoothness and spectrum by using weight, Jacobian and spectral regularization while assessing performance and adversarial robustness. Our work serves as a foundation for future research into the mechanisms underlying power law spectra and optimally smooth encodings in both biological and artificial systems. The insights gained may elucidate the mechanisms that realize robust neural networks in mammalian brains and inform the development of more stable and reliable artificial systems.

Exploring Biologically Inspired Mechanisms of Adversarial Robustness

TL;DR

in the region

, linking spectrum, geometry, and robustness. Regularization and pruning strategies modulate spectral properties and robustness, showing that a mechanistic, locally learning model can achieve improved resilience while maintaining expressivity, and suggesting power-law spectra as a hallmark of robust representations in both biological and artificial systems. These findings offer a principled path toward trustworthy AI and deepen understanding of how robust neural networks may be realized in mammalian brains.

Abstract

Paper Structure (4 sections, 16 equations, 6 figures)

This paper contains 4 sections, 16 equations, 6 figures.

Introduction
Methods
Results
Discussion

Figures (6)

Figure 1: Synapse and representation characteristics after unsupervised training with the local learning rule. a): Synaptic connections presented as images. Some entry blocks appear to resemble noise (Raw). We prune noisy contributions to improve spectral properties (Pruned). b): Distribution of total variances per image block in $S$. The distribution is bimodal with the major mode centered below $0.001$ and the minor mode around $0.002$. Pruning the higher variance contributions by setting a cutoff threshold at $0.015$ ablates noisy images in $S$ (see panel a)), and eliminates the initial drop in the representation's covariance spectrum (see panel b)). The latter results in comparably clean power relations between the eigenvalues of the first components. c): Ordered covariance spectra of the representations corresponding to both, the original (Raw) and ablated (Pruned) set of synapses, subjected to CIFAR10 (left panel) or Gaussian noise (right panel).
Figure 2: Left panel: Relative accuracy as a function of the perturbation parameter $\epsilon$ for all models of consideration under the three adversarial attacks. Right panel: Distributions of critical perturbation magnitude $\|\Delta x\|_{\text{crit.}}$ as L2 distance in the input space (minimal fooling distance) across all images in the input set that were originally correctly classified for all models and attacks considered.
Figure 3: Relational plot between test accuracy and the median of critical distances $\|\Delta x\|_{\text{crit.}}$ across models as a measure of robustness regarding a): random perturbations and b): Projected Gradient Descent.
Figure 4: Normalized covariance (PCA) spectra of latent representations $h(x)$ across CIFAR10 and Gaussian white noise $\xi(t)$ random input. The displayed spectra are those of the signals themselves, the hybrid model suggested by Krotov.2019 after initialization (Initialized) and after unsupervised training (KH Layer).
Figure 5: Normalized covariance (PCA) spectra of latent representations $h(x)$ across CIFAR10 and Gaussian white noise $\xi(t)$ random input. The displayed spectra are those of the fully gradient optimised models without (SHLP) and with regularisation (L2, JacReg, SpecReg).
...and 1 more figures

Exploring Biologically Inspired Mechanisms of Adversarial Robustness

TL;DR

Abstract

Exploring Biologically Inspired Mechanisms of Adversarial Robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (6)