Table of Contents
Fetching ...

Explainable Graph Neural Networks Under Fire

Zhong Li, Simon Geisler, Yuhang Wang, Stephan Günnemann, Matthijs van Leeuwen

TL;DR

A novel attack method dubbed GXAttack is devised, the first adversarial white-box attack method for post-hoc GNN explanations under such settings, that is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations.

Abstract

Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and practical utility of post-hoc explanation methods for GNNs. To be able to attack GNN explanation models, we devise a novel attack method dubbed \textit{GXAttack}, the first \textit{optimization-based} adversarial white-box attack method for post-hoc GNN explanations under such settings. Due to the devastating effectiveness of our attack, we call for an adversarial evaluation of future GNN explainers to demonstrate their robustness. For reproducibility, our code is available via GitHub.

Explainable Graph Neural Networks Under Fire

TL;DR

A novel attack method dubbed GXAttack is devised, the first adversarial white-box attack method for post-hoc GNN explanations under such settings, that is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations.

Abstract

Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and practical utility of post-hoc explanation methods for GNNs. To be able to attack GNN explanation models, we devise a novel attack method dubbed \textit{GXAttack}, the first \textit{optimization-based} adversarial white-box attack method for post-hoc GNN explanations under such settings. Due to the devastating effectiveness of our attack, we call for an adversarial evaluation of future GNN explainers to demonstrate their robustness. For reproducibility, our code is available via GitHub.
Paper Structure (26 sections, 16 equations, 19 figures, 4 tables)

This paper contains 26 sections, 16 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: When using post-hoc GNN explainers, the explanatory subgraph on a graph with "prediction-preserving" perturbations (right) can strongly differ from that on the original graph (left).
  • Figure 2: Sensitivity analysis w.r.t. maximally allowed perturbation budget (Max Budget) on Syn2 using GXAttack.
  • Figure 3: Sensitivity analysis w.r.t. maximal training epochs on Syn2 using GXAttack.
  • Figure : (a) Prediction confidence vs. original explanation accuracy on Syn2.
  • Figure : (a) Node degree vs. original explanation accuracy on Syn2.
  • ...and 14 more figures

Theorems & Definitions (1)

  • Definition 1: Attributed Graph