Attacking Graph Neural Networks with Bit Flips: Weisfeiler and Lehman Go Indifferent
Lorenz Kummer, Samir Moustafa, Nils N. Kriege, Wilfried N. Gansterer
TL;DR
This paper addresses the vulnerability of quantized graph neural networks to bit-flip attacks by leveraging the models’ injectivity-based expressivity tied to the $1$-WL test. It introduces the Injectivity Bit Flip Attack (IBFA), grounded in unfolding-tree representations and WL-discrimination tasks, and proposes two input-data strategies (IBFA1/IBFA2) alongside KL-based loss to minimize output differences between strategically chosen inputs. Empirical results on Open Graph Benchmark and TUDataset indicate IBFA degrades GIN-based models to nearly random outputs with relatively few bit flips, surpassing traditional PBFA baselines and random flips, and even transferring to other architectures like GCN. The findings highlight a structural-security risk for GNNs and motivate development of defenses that preserve injectivity in quantized neighborhoods and resistance to targeted bit perturbations. The work combines theoretical insights with comprehensive experiments to demonstrate a practically significant, architecture-aware vulnerability in GNNs.
Abstract
Prior attacks on graph neural networks have mostly focused on graph poisoning and evasion, neglecting the network's weights and biases. Traditional weight-based fault injection attacks, such as bit flip attacks used for convolutional neural networks, do not consider the unique properties of graph neural networks. We propose the Injectivity Bit Flip Attack, the first bit flip attack designed specifically for graph neural networks. Our attack targets the learnable neighborhood aggregation functions in quantized message passing neural networks, degrading their ability to distinguish graph structures and losing the expressivity of the Weisfeiler-Lehman test. Our findings suggest that exploiting mathematical properties specific to certain graph neural network architectures can significantly increase their vulnerability to bit flip attacks. Injectivity Bit Flip Attacks can degrade the maximal expressive Graph Isomorphism Networks trained on various graph property prediction datasets to random output by flipping only a small fraction of the network's bits, demonstrating its higher destructive power compared to a bit flip attack transferred from convolutional neural networks. Our attack is transparent and motivated by theoretical insights which are confirmed by extensive empirical results.
