Table of Contents
Fetching ...

Attacking Graph Neural Networks with Bit Flips: Weisfeiler and Lehman Go Indifferent

Lorenz Kummer, Samir Moustafa, Nils N. Kriege, Wilfried N. Gansterer

TL;DR

This paper addresses the vulnerability of quantized graph neural networks to bit-flip attacks by leveraging the models’ injectivity-based expressivity tied to the $1$-WL test. It introduces the Injectivity Bit Flip Attack (IBFA), grounded in unfolding-tree representations and WL-discrimination tasks, and proposes two input-data strategies (IBFA1/IBFA2) alongside KL-based loss to minimize output differences between strategically chosen inputs. Empirical results on Open Graph Benchmark and TUDataset indicate IBFA degrades GIN-based models to nearly random outputs with relatively few bit flips, surpassing traditional PBFA baselines and random flips, and even transferring to other architectures like GCN. The findings highlight a structural-security risk for GNNs and motivate development of defenses that preserve injectivity in quantized neighborhoods and resistance to targeted bit perturbations. The work combines theoretical insights with comprehensive experiments to demonstrate a practically significant, architecture-aware vulnerability in GNNs.

Abstract

Prior attacks on graph neural networks have mostly focused on graph poisoning and evasion, neglecting the network's weights and biases. Traditional weight-based fault injection attacks, such as bit flip attacks used for convolutional neural networks, do not consider the unique properties of graph neural networks. We propose the Injectivity Bit Flip Attack, the first bit flip attack designed specifically for graph neural networks. Our attack targets the learnable neighborhood aggregation functions in quantized message passing neural networks, degrading their ability to distinguish graph structures and losing the expressivity of the Weisfeiler-Lehman test. Our findings suggest that exploiting mathematical properties specific to certain graph neural network architectures can significantly increase their vulnerability to bit flip attacks. Injectivity Bit Flip Attacks can degrade the maximal expressive Graph Isomorphism Networks trained on various graph property prediction datasets to random output by flipping only a small fraction of the network's bits, demonstrating its higher destructive power compared to a bit flip attack transferred from convolutional neural networks. Our attack is transparent and motivated by theoretical insights which are confirmed by extensive empirical results.

Attacking Graph Neural Networks with Bit Flips: Weisfeiler and Lehman Go Indifferent

TL;DR

This paper addresses the vulnerability of quantized graph neural networks to bit-flip attacks by leveraging the models’ injectivity-based expressivity tied to the -WL test. It introduces the Injectivity Bit Flip Attack (IBFA), grounded in unfolding-tree representations and WL-discrimination tasks, and proposes two input-data strategies (IBFA1/IBFA2) alongside KL-based loss to minimize output differences between strategically chosen inputs. Empirical results on Open Graph Benchmark and TUDataset indicate IBFA degrades GIN-based models to nearly random outputs with relatively few bit flips, surpassing traditional PBFA baselines and random flips, and even transferring to other architectures like GCN. The findings highlight a structural-security risk for GNNs and motivate development of defenses that preserve injectivity in quantized neighborhoods and resistance to targeted bit perturbations. The work combines theoretical insights with comprehensive experiments to demonstrate a practically significant, architecture-aware vulnerability in GNNs.

Abstract

Prior attacks on graph neural networks have mostly focused on graph poisoning and evasion, neglecting the network's weights and biases. Traditional weight-based fault injection attacks, such as bit flip attacks used for convolutional neural networks, do not consider the unique properties of graph neural networks. We propose the Injectivity Bit Flip Attack, the first bit flip attack designed specifically for graph neural networks. Our attack targets the learnable neighborhood aggregation functions in quantized message passing neural networks, degrading their ability to distinguish graph structures and losing the expressivity of the Weisfeiler-Lehman test. Our findings suggest that exploiting mathematical properties specific to certain graph neural network architectures can significantly increase their vulnerability to bit flip attacks. Injectivity Bit Flip Attacks can degrade the maximal expressive Graph Isomorphism Networks trained on various graph property prediction datasets to random output by flipping only a small fraction of the network's bits, demonstrating its higher destructive power compared to a bit flip attack transferred from convolutional neural networks. Our attack is transparent and motivated by theoretical insights which are confirmed by extensive empirical results.
Paper Structure (32 sections, 3 theorems, 19 equations, 10 figures, 2 tables)

This paper contains 32 sections, 3 theorems, 19 equations, 10 figures, 2 tables.

Key Result

Lemma 3.2

Let $k \geq 0$ and $u$, $v$ nodes, then $c_l^{(k)}(u)=c_l^{(k)}(v) \Longleftrightarrow T^{(k)}(u) \simeq T^{(k)}(v).$

Figures (10)

  • Figure 1: Example of two non-isomorphic unfolding trees $T^{(2)}(u) \not \simeq T^{(2)}(v)$ of height 2 associated with the nodes $u$ and $v$. A function solving a WL-discriminitation task for $k=2$ must be able to discriminate $u$ and $v$ based on the structure of their unfolding trees.
  • Figure 2: Exemplary results of the 2-layer GNNs $F^{(2)}$ and $\hat{F}^{(2)}$ using $f^{(i)}$ and $\hat{f}^{(i)}$, respectively, for $i \in \{1,2\}$. Nodes having the same embedding are shown in the same color and are labeled with the same integer. Although $\hat{f}^{(1)}$ is non-injective and $\hat{F}^{(1)}$ is coarser than $F^{(1)}$, we have $F^{(2)}=\hat{F}^{(2)}$. The final output corresponds to the WL coloring.
  • Figure 3: IBFA1/2's integration of PBS and input data selection strategies for $k$ attack iterations. In the first attack iteration, input data selection of IBFA1 and IBFA2 are identical.
  • Figure 4: Pre- (clean) and post-attack test quality metrics AP, AUROC or ACC for different BFA variants on a 5-layer GIN trained on six ogbg-mol and two TUDataset datasets, number of bit flips, averages of 10 runs.
  • Figure 5: Jaccard distances $J_m(C_l^{(k)}(G_i), C_l^{(k)}(G_j)), i \neq j$, averaged over all graphs $G_i, G_j$ and tasks (classes) in each randomly drawn sample of 3200 graphs per dataset, indicating $\varepsilon$-GLWL-discriminitation tasks. The dotted lines distinguish molecular and social datasets for various numbers of WL-iterations ($k$) from 1 to 7 and suggest possible choices for $\varepsilon$.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Definition 3.1: Unfolding Tree dinverno2021aup
  • Lemma 3.2: dinverno2021aup
  • Proposition 3.3
  • Definition 3.4: WL-discriminitation task
  • Proposition 3.5
  • Definition 3.6: GLWL-discriminitation task
  • Definition 3.7: Multiset Jaccard distance parapar2008winnow
  • Definition 3.8: $\varepsilon$-GLWL-discriminitation task
  • proof
  • proof