Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG

Manshan Guo; Bhavin Choksi; Sari Sadiya; Alessandro T. Gifford; Martina G. Vilas; Radoslaw M. Cichy; Gemma Roig

Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG

Manshan Guo, Bhavin Choksi, Sari Sadiya, Alessandro T. Gifford, Martina G. Vilas, Radoslaw M. Cichy, Gemma Roig

TL;DR

This work tackles the vulnerability of object-recognition networks to adversarial perturbations by testing whether aligning model representations with human EEG responses improves robustness. It introduces a dual-task learning framework in which a ResNet50 backbone jointly predicts EEG signals and object labels, trained on a large-scale THINGS EEG dataset with 17-channel recordings and 100 Hz timing. The study finds a positive correlation between EEG-prediction accuracy and robustness gains across multiple architectures and attacks, with the strongest signals around 100 ms post-stimulus and mid-level parieto‑occipital channels driving much of the effect; however, the gains are modest and persist even with shuffled EEG controls. These results suggest that scalable, brain-informed regularization via EEG data can aid adversarial robustness, motivating larger, more diverse EEG datasets and multimodal stimulus conditions to amplify the effect.

Abstract

In contrast to human vision, artificial neural networks (ANNs) remain relatively susceptible to adversarial attacks. To address this vulnerability, efforts have been made to transfer inductive bias from human brains to ANNs, often by training the ANN representations to match their biological counterparts. Previous works relied on brain data acquired in rodents or primates using invasive techniques, from specific regions of the brain, under non-natural conditions (anesthetized animals), and with stimulus datasets lacking diversity and naturalness. In this work, we explored whether aligning model representations to human EEG responses to a rich set of real-world images increases robustness to ANNs. Specifically, we trained ResNet50-backbone models on a dual task of classification and EEG prediction; and evaluated their EEG prediction accuracy and robustness to adversarial attacks. We observed significant correlation between the networks' EEG prediction accuracy, often highest around 100 ms post stimulus onset, and their gains in adversarial robustness. Although effect size was limited, effects were consistent across different random initializations and robust for architectural variants. We further teased apart the data from individual EEG channels and observed strongest contribution from electrodes in the parieto-occipital regions. The demonstrated utility of human EEG for such tasks opens up avenues for future efforts that scale to larger datasets under diverse stimuli conditions with the promise of stronger effects.

Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG

TL;DR

Abstract

Paper Structure (27 sections, 4 equations, 7 figures, 1 table)

This paper contains 27 sections, 4 equations, 7 figures, 1 table.

Introduction
Related work
Methods
Dataset
Architectures and training
EEG prediction evaluation
Adversarial robustness evaluation and Robustness gain
Correlation between adversarial robustness gain and EEG prediction
Results
Adversarial robustness gains were positively correlated with the models' EEG prediction
Electrodes from mid-level EEG channels contribute most strongly to robustness
Discussion and Conclusion
Appendix
EEG-Images pairs
EEG-Image pre-processing
...and 12 more sections

Figures (7)

Figure 1: Paradigm for improving adversarial robustness via co-training with human EEG: We first trained dual-task learning (DTL) models with original and shuffled EEG data and then evaluated their robustness against various adversarial attacks. We trained four clusters of ResNet50 backbone models, each incorporating a different independent EEG predictor: Dense Layers (CNN), Recurrent Neural Networks (RNN), Transformer, and Attention layers. Finally, we measured the relationship between adversarial robustness gain and EEG prediction accuracy.
Figure 2: Adversarial robustness gain was correlated with EEG prediction (A) correlation value between mean adversarial robustness gain ($Avg\_Gain_{DTL}$) and mean EEG prediction accuracy ($Avg\_PCC\_tps$) peaked at 0.09s (B) Adversarial robustness gains across all the three attacks were significantly correlated with prediction accuracy of EEG from 0.09s to 0.14s. The correlation values $R^2$ correspond to the peak $R^2$ in (A). Colors denote architecture type and markers denote if the features used for EEG prediction were from only the 4th block, both the 3rd and 4th block (via concatenating/averaging of features from 3rd and 4th block, all 4 blocks, the last 3 blocks...), or other blocks excluding the 4th. Blue squares represent integrating 3rd and 4th block features for EEG prediction, which generally achieved higher $Avg\_Gain_{DTL}$ and $Avg\_PCC\_tps$. (C) EEG prediction of our most robust model (shown red arrow in B) using Pearson correlation coefficients. The correlation around 100ms is significant (p < 0.05, Bonferroni corrected). (D) Adversarial robustness gain ($Gain_{DTL}(\epsilon)$) of the model denoted in (B) along with the controls co-trained on shuffled and randomly generated EEG. Shaded regions represent the standard error over training seeds and subjects. ).
Figure 3: Contribution of individual EEG electrodes in network robustness (A) The PCC values across channels, calculated as the correlation between the predicted and the real EEG from 0.09s to 0.14s. The lower and upper noise ceilings are calculated as per gifford2022large and are denoted in black and red lines respectively. (B) A 64-channel EEG top down brain view with the 17 electrodes covering occipital and parietal cortex colored with the PCC values obtained (from (A)) for each electrode. (C) Correlation values between the robustness gains (for each attack) and PCC values for each EEG channel
Figure 4: 2 architectures in RNN cluster
Figure 5: 3 architectures in CNN cluster.
...and 2 more figures

Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG

TL;DR

Abstract

Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG

Authors

TL;DR

Abstract

Table of Contents

Figures (7)