Table of Contents
Fetching ...

TreeX: Generating Global Graphical GNN Explanations via Critical Subtree Extraction

Shengyao Lu, Jiuding Yang, Baochun Li, Di Niu

TL;DR

TreeX addresses the challenge of global, graphical explanations for WL-based GNNs by mining full $L$-hop subtrees incurred during message passing and representing each subtree with its last-layer root embedding. The method builds global graph concepts through clustering of local subtree embeddings and merges duplicates via isomorphism, then learns class-specific global-rule weights to explain both dataset-wide and individual predictions. It enables explaining incorrect predictions and provides faithful, interpretable motifs that align with ground-truth concepts, outperforming existing global explainers and rivaling local explainers in fidelity. This approach offers practical, scalable, and interpretable explanations for graph-structured data, with direct applicability to molecular and network analyses."

Abstract

The growing demand for transparency and interpretability in critical domains has driven increased interests in comprehending the explainability of Message-Passing (MP) Graph Neural Networks (GNNs). Although substantial research efforts have been made to generate explanations for individual graph instances, identifying global explaining concepts for a GNN still poses great challenges, especially when concepts are desired in a graphical form on the dataset level. While most prior works treat GNNs as black boxes, in this paper, we propose to unbox GNNs by analyzing and extracting critical subtrees incurred by the inner workings of message passing, which correspond to critical subgraphs in the datasets. By aggregating subtrees in an embedding space with an efficient algorithm, which does not require complex subgraph matching or search, we can make intuitive graphical explanations for Message-Passing GNNs on local, class and global levels. We empirically show that our proposed approach not only generates clean subgraph concepts on a dataset level in contrast to existing global explaining methods which generate non-graphical rules (e.g., language or embeddings) as explanations, but it is also capable of providing explanations for individual instances with a comparable or even superior performance as compared to leading local-level GNN explainers.

TreeX: Generating Global Graphical GNN Explanations via Critical Subtree Extraction

TL;DR

TreeX addresses the challenge of global, graphical explanations for WL-based GNNs by mining full -hop subtrees incurred during message passing and representing each subtree with its last-layer root embedding. The method builds global graph concepts through clustering of local subtree embeddings and merges duplicates via isomorphism, then learns class-specific global-rule weights to explain both dataset-wide and individual predictions. It enables explaining incorrect predictions and provides faithful, interpretable motifs that align with ground-truth concepts, outperforming existing global explainers and rivaling local explainers in fidelity. This approach offers practical, scalable, and interpretable explanations for graph-structured data, with direct applicability to molecular and network analyses."

Abstract

The growing demand for transparency and interpretability in critical domains has driven increased interests in comprehending the explainability of Message-Passing (MP) Graph Neural Networks (GNNs). Although substantial research efforts have been made to generate explanations for individual graph instances, identifying global explaining concepts for a GNN still poses great challenges, especially when concepts are desired in a graphical form on the dataset level. While most prior works treat GNNs as black boxes, in this paper, we propose to unbox GNNs by analyzing and extracting critical subtrees incurred by the inner workings of message passing, which correspond to critical subgraphs in the datasets. By aggregating subtrees in an embedding space with an efficient algorithm, which does not require complex subgraph matching or search, we can make intuitive graphical explanations for Message-Passing GNNs on local, class and global levels. We empirically show that our proposed approach not only generates clean subgraph concepts on a dataset level in contrast to existing global explaining methods which generate non-graphical rules (e.g., language or embeddings) as explanations, but it is also capable of providing explanations for individual instances with a comparable or even superior performance as compared to leading local-level GNN explainers.

Paper Structure

This paper contains 42 sections, 5 theorems, 14 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Theorem 4.2

Given a graph $G=(V,E)$ with the countable input node features ${\mathbf{X}}$, and a $L$-layer GNN $f(\cdot)$ that updates the layer-wise node-embeddings by eq:gnn. Then $\forall l\in \{1,\dots L\}$ and $\forall v\in V$, the $l$-th layer node embedding ${\bm{h}}^{(l)}_v$ is a Perfect Rooted Tree Rep

Figures (11)

  • Figure 1: An illustrative example of the global explanations produced by TreeX and how the global explanations can be employed to explain individual instances. The global rule offers the optimal weights of different concepts to enhance the probability of predicting target class.
  • Figure 2: Overview of our proposed approach. This figure illustrates our approach for a 2 Layer GNN. The "subtrees" in this figure refer to the full $l$-hop subtrees. Phase 1: Collect subtrees in the graph, and extract local concept by identifying the overlapping substructures. Phase 2: Extract global concepts by clustering the local concepts. Phase 3: Generate global rules for each target class.
  • Figure 3: Global explanations by TreeX (ours), GCNeuron and GLGExplainer. We run both baseline methods so that they explain the same GNN models as our approach. Due to space limit, the explanations on BAMultiShapes and NCI1 datasets are moved to the appendix.
  • Figure 4: Visualization of employing the global explanations produced by TreeX to discover the cause of the incorrect prediction of the GNN. Due to the space limit, we omit the concepts that are not in this graph.
  • Figure 5: Global explanations by TreeX (ours), GCNeuron and GLGExplainer on BAMultiShapes and NCI1 datasets. We run both baseline methods so that they explain the same GNN models as our approach.
  • ...and 6 more figures

Theorems & Definitions (11)

  • Definition 4.1: Perfect Rooted Tree Representation
  • Theorem 4.2
  • Corollary A.1
  • proof
  • proof
  • Lemma B.1: Base step
  • proof
  • Lemma B.2: Inductive step
  • proof
  • Lemma B.3
  • ...and 1 more