GNNInterpreter: A Probabilistic Generative Model-Level Explanation for Graph Neural Networks
Xiaoqi Wang, Han-Wei Shen
TL;DR
GNNInterpreter addresses the need for trustworthy, model-level explanations of Graph Neural Networks by learning a probabilistic generative graph distribution over explanation graphs. It optimizes a joint objective $L(G) = \phi_c(\mathbf{A},\mathbf{Z},\mathbf{X}) + \mu\, \mathrm{sim_{cos}}(\psi(\mathbf{A},\mathbf{Z},\mathbf{X}), \bar{\boldsymbol{\psi}}_c)$ using continuous relaxations (Concrete distributions) and the reparameterization trick under a Gilbert random-graph assumption to handle discrete topology and features. The approach supports diverse node and edge features, avoids hand-crafted domain rules, and includes regularization terms for sparsity, size control, and connectivity, yielding faithful, realistic explanations that align with ground-truth patterns and reveal potential model pitfalls such as bias attribution. Empirical results on synthetic and real-world datasets show that GNNInterpreter often produces explanation graphs with target-class probabilities near 1.0 and offers substantial speed advantages over XGNN, making it a practical tool for debugging and validating GNNs in critical applications.
Abstract
Recently, Graph Neural Networks (GNNs) have significantly advanced the performance of machine learning tasks on graphs. However, this technological breakthrough makes people wonder: how does a GNN make such decisions, and can we trust its prediction with high confidence? When it comes to some critical fields, such as biomedicine, where making wrong decisions can have severe consequences, it is crucial to interpret the inner working mechanisms of GNNs before applying them. In this paper, we propose a model-agnostic model-level explanation method for different GNNs that follow the message passing scheme, GNNInterpreter, to explain the high-level decision-making process of the GNN model. More specifically, GNNInterpreter learns a probabilistic generative graph distribution that produces the most discriminative graph pattern the GNN tries to detect when making a certain prediction by optimizing a novel objective function specifically designed for the model-level explanation for GNNs. Compared to existing works, GNNInterpreter is more flexible and computationally efficient in generating explanation graphs with different types of node and edge features, without introducing another blackbox or requiring manually specified domain-specific rules. In addition, the experimental studies conducted on four different datasets demonstrate that the explanation graphs generated by GNNInterpreter match the desired graph pattern if the model is ideal; otherwise, potential model pitfalls can be revealed by the explanation. The official implementation can be found at https://github.com/yolandalalala/GNNInterpreter.
