GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

Mert Kosan; Samidha Verma; Burouj Armgaan; Khushbu Pahwa; Ambuj Singh; Sourav Medya; Sayan Ranu

GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

Mert Kosan, Samidha Verma, Burouj Armgaan, Khushbu Pahwa, Ambuj Singh, Sourav Medya, Sayan Ranu

TL;DR

This benchmarking study empowers stakeholders in the field of GNNs with a comprehensive understanding of the state-of-the-art explainability methods, potential research problems for further enhancement, and the implications of their application in real-world scenarios.

Abstract

Numerous explainability methods have been proposed to shed light on the inner workings of GNNs. Despite the inclusion of empirical evaluations in all the proposed algorithms, the interrogative aspects of these evaluations lack diversity. As a result, various facets of explainability pertaining to GNNs, such as a comparative analysis of counterfactual reasoners, their stability to variational factors such as different GNN architectures, noise, stochasticity in non-convex loss surfaces, feasibility amidst domain constraints, and so forth, have yet to be formally investigated. Motivated by this need, we present a benchmarking study on perturbation-based explainability methods for GNNs, aiming to systematically evaluate and compare a wide range of explainability techniques. Among the key findings of our study, we identify the Pareto-optimal methods that exhibit superior efficacy and stability in the presence of noise. Nonetheless, our study reveals that all algorithms are affected by stability issues when faced with noisy data. Furthermore, we have established that the current generation of counterfactual explainers often fails to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations. Overall, this benchmarking study empowers stakeholders in the field of GNNs with a comprehensive understanding of the state-of-the-art explainability methods, potential research problems for further enhancement, and the implications of their application in real-world scenarios.

GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

TL;DR

Abstract

Paper Structure (33 sections, 2 equations, 19 figures, 23 tables)

This paper contains 33 sections, 2 equations, 19 figures, 23 tables.

Introduction and Related Work
Contributions
Preliminaries and Background
Review of Perturbation-based Gnn Reasoning
Benchmarking Framework
Empirical Evaluation
Comparative Analysis
Stability
Necessity and Reproducibility
Feasibility
Visualization-based Analysis
Concluding Insights and Potential Solutions
Acknowledgements
Experimental Setup
Benchmark Datasets
...and 18 more sections

Figures (19)

Figure 1: Structuring the space of the existing methods on Gnn explainability.
Figure 2: Sufficiency of the factual explainers against the explanation size. For factual explanations, higher is better. We omit those methods for a dataset that threw an out-of-memory (OOM) error.
Figure 3: Stability of factual explainers in Jaccard similarity of explanations under topological noise. Here, the $x$-ticks (Noise) denote the number of perturbations made to the edge set of the original graph. Here, perturbations include randomly sampling x(denoted on x axis) negative edges and adding them to the original edge set (i.e., connect a pair of nodes that were previously unconnected).
Figure D: Motifs used in (a) Tree-Cycles, (b) Tree-Grid and (c) BA-Shapes datasets for the node classification task. Please note the following. (i) Tree-Cycles and Tree-Grid have labels 0 and 1 for the non-motif and the motif nodes, respectively. Hence, all nodes in (a) and (b) have label 1. (ii) BA-Shapes dataset has $4$ classes. Non-motif nodes have labels 0; motif nodes have integral labels depending on the position in the house motif. The other labels are 1 (top node), 2 (middle nodes) and 3 (bottom nodes). They are represented in (c).
Figure E: Sufficiency of the inductive factual explainers against the explanation size on only test data. For factual explanations, higher is better. We omit those methods for a dataset that throw an out-of-memory (OOM) error and are not scalable.
...and 14 more figures

Theorems & Definitions (2)

Definition 1: Perturbation-based Factual Reasoning
Definition 2: Counterfactual Reasoning

GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

TL;DR

Abstract

GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

Authors

TL;DR

Abstract

Table of Contents

Figures (19)

Theorems & Definitions (2)