Explainability-Based Adversarial Attack on Graphs Through Edge Perturbation
Dibaloke Chanda, Saba Heidari Gheshlaghi, Nasim Yahya Soltani
TL;DR
This study addresses the vulnerability of graph neural networks to adversarial attacks by introducing an explainability-based edge perturbation strategy. It identifies important subgraphs using GNNExplainerying and PGExplainer, then performs edge insertions between nodes of different classes and deletions within the same-class important regions to degrade node classification performance under test-time perturbations. Across three architectures (GCN, GAT, GraphSAGE) and three datasets (Cora, CiteSeer, PubMed), inserting inter-class edges within the important subgraph yields larger misclassification rates than intra-class deletions, with GraphSAGE showing particular strength on larger graphs. The results highlight the value of explainability-driven attack design and suggest targeted defenses focusing on vulnerable subgraph structures to improve GNN robustness.
Abstract
Despite the success of graph neural networks (GNNs) in various domains, they exhibit susceptibility to adversarial attacks. Understanding these vulnerabilities is crucial for developing robust and secure applications. In this paper, we investigate the impact of test time adversarial attacks through edge perturbations which involve both edge insertions and deletions. A novel explainability-based method is proposed to identify important nodes in the graph and perform edge perturbation between these nodes. The proposed method is tested for node classification with three different architectures and datasets. The results suggest that introducing edges between nodes of different classes has higher impact as compared to removing edges among nodes within the same class.
