Table of Contents
Fetching ...

Adversarial Attacks on Graph Neural Networks via Meta Learning

Daniel Zügner, Stephan Günnemann

TL;DR

This work investigates training-time poisoning attacks on graph neural networks for node classification by treating the input graph as a hyperparameter and solving a bilevel optimization with meta-gradients. A greedy, memory-efficient meta-gradient approach perturbs a small fraction of edges to maximize post-training misclassification, with attacks that transfer to unseen models and even to unsupervised embeddings. The authors also propose first-order and Reptile-inspired approximations that maintain strong destructive impact while reducing computational overhead. Experimental results across multiple datasets and models demonstrate the vulnerability of GNNs to global poisoning under budgeted, unnoticeable perturbations, highlighting the need for defenses that address training-time threats and data integrity in graph-based learning.

Abstract

Deep learning models for graphs have advanced the state of the art on many tasks. Despite their recent success, little is known about their robustness. We investigate training time attacks on graph neural networks for node classification that perturb the discrete graph structure. Our core principle is to use meta-gradients to solve the bilevel problem underlying training-time attacks, essentially treating the graph as a hyperparameter to optimize. Our experiments show that small graph perturbations consistently lead to a strong decrease in performance for graph convolutional networks, and even transfer to unsupervised embeddings. Remarkably, the perturbations created by our algorithm can misguide the graph neural networks such that they perform worse than a simple baseline that ignores all relational information. Our attacks do not assume any knowledge about or access to the target classifiers.

Adversarial Attacks on Graph Neural Networks via Meta Learning

TL;DR

This work investigates training-time poisoning attacks on graph neural networks for node classification by treating the input graph as a hyperparameter and solving a bilevel optimization with meta-gradients. A greedy, memory-efficient meta-gradient approach perturbs a small fraction of edges to maximize post-training misclassification, with attacks that transfer to unseen models and even to unsupervised embeddings. The authors also propose first-order and Reptile-inspired approximations that maintain strong destructive impact while reducing computational overhead. Experimental results across multiple datasets and models demonstrate the vulnerability of GNNs to global poisoning under budgeted, unnoticeable perturbations, highlighting the need for defenses that address training-time threats and data integrity in graph-based learning.

Abstract

Deep learning models for graphs have advanced the state of the art on many tasks. Despite their recent success, little is known about their robustness. We investigate training time attacks on graph neural networks for node classification that perturb the discrete graph structure. Our core principle is to use meta-gradients to solve the bilevel problem underlying training-time attacks, essentially treating the graph as a hyperparameter to optimize. Our experiments show that small graph perturbations consistently lead to a strong decrease in performance for graph convolutional networks, and even transfer to unsupervised embeddings. Remarkably, the perturbations created by our algorithm can misguide the graph neural networks such that they perform worse than a simple baseline that ignores all relational information. Our attacks do not assume any knowledge about or access to the target classifiers.

Paper Structure

This paper contains 17 sections, 11 equations, 14 figures, 8 tables, 2 algorithms.

Figures (14)

  • Figure 1: Change in accuracy of GCN on Cora-ML for increasing number of perturbations.
  • Figure 2: Comparison with logistic regression baseline on Citeseer.
  • Figure 3: Accuracy of clean/ corrupted graph and weights.
  • Figure 4: Poisoning results with limited knowledge about the graph (i.e. on a subgraph) after 10% changes.
  • Figure 5: Analysis of adversarially inserted edges
  • ...and 9 more figures