Table of Contents
Fetching ...

A Survey on Explainability of Graph Neural Networks

Jaykumar Kakkad, Jaspal Jannu, Kartik Sharma, Charu Aggarwal, Sourav Medya

TL;DR

The survey catalogs a comprehensive taxonomy of explainability methods for Graph Neural Networks, distinguishing factual (self-interpretable and post-hoc) and counterfactual explanations, and further organizing methods into decomposition, gradient, surrogate, perturbation, and generation families. It highlights self-interpretable approaches based on information and structural constraints, and surveys temporal, global, and causality-based explanations to broaden applicability. The paper also covers counterfactual strategies (perturbation, neural, and search-based), application domains, datasets (synthetic and real-world), and evaluation metrics (quantitative and qualitative), and discusses future directions such as global explanations and human-centric visualization. Overall, it provides a structured roadmap for understanding, comparing, and advancing interpretable graph-based machine learning in diverse, high-stakes domains.

Abstract

Graph neural networks (GNNs) are powerful graph-based deep-learning models that have gained significant attention and demonstrated remarkable performance in various domains, including natural language processing, drug discovery, and recommendation systems. However, combining feature information and combinatorial graph structures has led to complex non-linear GNN models. Consequently, this has increased the challenges of understanding the workings of GNNs and the underlying reasons behind their predictions. To address this, numerous explainability methods have been proposed to shed light on the inner mechanism of the GNNs. Explainable GNNs improve their security and enhance trust in their recommendations. This survey aims to provide a comprehensive overview of the existing explainability techniques for GNNs. We create a novel taxonomy and hierarchy to categorize these methods based on their objective and methodology. We also discuss the strengths, limitations, and application scenarios of each category. Furthermore, we highlight the key evaluation metrics and datasets commonly used to assess the explainability of GNNs. This survey aims to assist researchers and practitioners in understanding the existing landscape of explainability methods, identifying gaps, and fostering further advancements in interpretable graph-based machine learning.

A Survey on Explainability of Graph Neural Networks

TL;DR

The survey catalogs a comprehensive taxonomy of explainability methods for Graph Neural Networks, distinguishing factual (self-interpretable and post-hoc) and counterfactual explanations, and further organizing methods into decomposition, gradient, surrogate, perturbation, and generation families. It highlights self-interpretable approaches based on information and structural constraints, and surveys temporal, global, and causality-based explanations to broaden applicability. The paper also covers counterfactual strategies (perturbation, neural, and search-based), application domains, datasets (synthetic and real-world), and evaluation metrics (quantitative and qualitative), and discusses future directions such as global explanations and human-centric visualization. Overall, it provides a structured roadmap for understanding, comparing, and advancing interpretable graph-based machine learning in diverse, high-stakes domains.

Abstract

Graph neural networks (GNNs) are powerful graph-based deep-learning models that have gained significant attention and demonstrated remarkable performance in various domains, including natural language processing, drug discovery, and recommendation systems. However, combining feature information and combinatorial graph structures has led to complex non-linear GNN models. Consequently, this has increased the challenges of understanding the workings of GNNs and the underlying reasons behind their predictions. To address this, numerous explainability methods have been proposed to shed light on the inner mechanism of the GNNs. Explainable GNNs improve their security and enhance trust in their recommendations. This survey aims to provide a comprehensive overview of the existing explainability techniques for GNNs. We create a novel taxonomy and hierarchy to categorize these methods based on their objective and methodology. We also discuss the strengths, limitations, and application scenarios of each category. Furthermore, we highlight the key evaluation metrics and datasets commonly used to assess the explainability of GNNs. This survey aims to assist researchers and practitioners in understanding the existing landscape of explainability methods, identifying gaps, and fostering further advancements in interpretable graph-based machine learning.
Paper Structure (36 sections, 5 equations, 4 figures, 11 tables)

This paper contains 36 sections, 5 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Overview of the Schema.(1) Factual. Information constraints: GIB GIB, VGIB VGIB, GSAT GSAT, LRI inject-explain; Structural Constraints: DIR D_invariant_rationale, ProtGNN protgnn, SEGNN SE-GNN, KER-GNN kergnns; Decomposition: CAM Excitation-BP, Excitation-BP Excitation-BP, DEGREE degree, GNN-LRP GNN-LRP; Gradient-based: SA guided-bp , Guided-BP guided-bp , Grad-CAM Excitation-BP; Surrogate: PGM-Ex pgexplainer, GraphLime graphlime, GraphSVX graphsvx, ReLex RELex, DnX distilexplain; Perturbation-based: GNNExplainerying2019gnnexplainer, GraphMask Graph-mask, PGExplainer pgexplainer, ReFine ReFine, ZORRO zorro, SubgraphX subgraphX, GstarX gstarx; Generation: XGNN xgnn, RGExplainer RL-enhanced, GNNInterpreter gnninterpreter, GFlowExplainer Gflow, GEM Gen-causal; (2) Counterfactual. Search-based: MMACE agnostic-counter , MEG meg-counter; Neural Network-based: RCExplainer robust-counter, CLEAR clear-counter; Perturbation-based: GREASE chen2022grease, CF2 cf^2-counter, CF-GNNexplainer cfgnnex
  • Figure 2: (a) Self-interpretable and post-hoc architectures : In self-interpretable methods, the subgraph extraction module $g$ uses constraints to find an informative subgraph $G\xspace_s$ from the input graph $G\xspace$. The prediction module $f$ uses this $G\xspace_s$ to predict the label $Y$. In contrast, Post-hoc methods consider model as pre-trained with fixed weights. For any instance $G\xspace$, post-hoc methods generate explanation using model's input $\mathcal{D}\xspace$, output $Y$ and in some cases the model's internal parameters. (b) White-box and Black-box post-hoc methods: Methods are shown in the individual categories. Decomposition-based: CAM Excitation-BP, Excitation-BP Excitation-BP, DEGREE degree, GNN-LRP GNN-LRP; Gradient-based: SA guided-bp , Guided-BP guided-bp , Grad-CAM Excitation-BP; Surrogate: PGM-Ex pgexplainer, GraphLime graphlime, GraphSVX graphsvx, ReLex RELex, DnX distilexplain; Perturbation-based: GNNExplainer ying2019gnnexplainer, GraphMask Graph-mask, PGExplainer pgexplainer, ReFine ReFine, ZORRO zorro, SubgraphX subgraphX, GstarX gstarx; Generation-based: XGNN xgnn, RGExplainer RL-enhanced, GNNInterpreter gnninterpreter, GFlowExplainer Gflow, GEM Gen-causal.
  • Figure 3: a) Surrogate: These methods follow a two-step process. For any instance $G\xspace$, they generate data from the neighbourhood of the prediction by using multiple inputs $D$ in the locality and recording its model prediction $Y$. Then a surrogate model is used to fit this data. Explanation $E$ for the surrogate model is the explanation for the prediction, b) Perturbation-based: They have two key modules: a subgraph extraction architecture and a scoring function. For an input $G$, the subgraph extraction module extracts a subgraph $G_s$. The model prediction $Y_s$ for subgraph $G_s$ are scored against the actual predictions $Y$ using a scoring function. The feedback from the scoring function can be used to train the subgraph extraction module. Sometimes model parameters are also used as the training input to the subgraph extraction module. The optimal subgraph $G_s^*$ acts as the final explanation $E$.
  • Figure 4: Self-interpretable methods: Every self-interpretable method has a subgraph extraction and a prediction module. The subgraph extraction module (the function $g$) uses constraints to find an informative subgraph $G\xspace_s$ from input graph $G\xspace$. The prediction module uses $G\xspace_s$ to predict label $Y$. This also shows the techniques used by each method to implement these individual modules. Self-interpretable Methods are categorized based on constraints: (1) Information constraint: GIB GIB, VGIB VGIB, GSAT GSAT, LRI inject-explain; (2) Structural constraint: DIR D_invariant_rationale, ProtGNN protgnn, SEGNN SE-GNN, KER-GNN kergnns.