Table of Contents
Fetching ...

How Interpretable Are Interpretable Graph Neural Networks?

Yongqiang Chen, Yatao Bian, Bo Han, James Cheng

TL;DR

This work designs a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT and demonstrates that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.

Abstract

Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explored. In this work, we present a theoretical framework that formulates interpretable subgraph learning with the multilinear extension of the subgraph distribution, coined as subgraph multilinear extension (SubMT). Extracting the desired interpretable subgraph requires an accurate approximation of SubMT, yet we find that the existing XGNNs can have a huge gap in fitting SubMT. Consequently, the SubMT approximation failure will lead to the degenerated interpretability of the extracted subgraphs. To mitigate the issue, we design a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT. We empirically validate our theoretical findings on a number of graph classification benchmarks. The results demonstrate that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.

How Interpretable Are Interpretable Graph Neural Networks?

TL;DR

This work designs a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT and demonstrates that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.

Abstract

Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explored. In this work, we present a theoretical framework that formulates interpretable subgraph learning with the multilinear extension of the subgraph distribution, coined as subgraph multilinear extension (SubMT). Extracting the desired interpretable subgraph requires an accurate approximation of SubMT, yet we find that the existing XGNNs can have a huge gap in fitting SubMT. Consequently, the SubMT approximation failure will lead to the degenerated interpretability of the extracted subgraphs. To mitigate the issue, we design a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT. We empirically validate our theoretical findings on a number of graph classification benchmarks. The results demonstrate that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.
Paper Structure (55 sections, 6 theorems, 58 equations, 21 figures, 12 tables, 3 algorithms)

This paper contains 55 sections, 6 theorems, 58 equations, 21 figures, 12 tables, 3 algorithms.

Key Result

Proposition 3.3

An XGNN based on linear GNN with $k>1$ cannot satisfy Eq. eq:exp_issue, thus cannot approximate SubMT.

Figures (21)

  • Figure 1: Illustration of Subgraph Multilinear Extension approximation failure. In a binary graph classification task, XGNNs need to classify whether a graph contains a specific “house” or “cycle” motif into two steps: (a) Subgraph Extraction, a subgraph extractor computes the sampling probability for each edge using the attention mechanism, which further determines the subgraph distribution: part of the “house” or “grid” motif is sampled with a certain probability. The expected subgraph $\widehat{G}_c$ to be sampled according to the subgraph distribution is a “soft” subgraph, with each edge weighting the corresponding sampling probability. (b) Subgraph Classification, each subgraph $G_c^i$ corresponds to a respective label distribution $P(Y|G_c^i)$ (e.g., when some key parts of the “house” motif are sampled, the “house” label probability will be higher). The final label predictions are conditioned on the subgraph distributions, averaged upon each $P(Y|G_c^i)$ with the probability of $G_c^i$ being sampled. The averaged label distribution leads to a prediction of “house” for the example. Instead of averaging the subgraph conditional label predictions, previous methods directly take the expected “soft” subgraph as the input to the subgraph classifier GNN $f_c$, which can be biased and lead to an incorrect prediction of “cycle”.
  • Figure 2: Comparison of simulated SubMT and GSAT in terms of counterfactual faithfulness.
  • Figure 3: Ablation studies.
  • Figure 4: Full SCMs on Graph Distribution Shifts ciga.
  • Figure 5: Bernoulli Parameterized SCM for interpretable GNN
  • ...and 16 more figures

Theorems & Definitions (15)

  • Definition 3.1: Subgraph multilinear extension (SubMT)
  • Definition 3.2: $\epsilon$-SubMT approximation
  • Proposition 3.3
  • Definition 4.1: $(\delta,\epsilon)$-counterfactual fidelity
  • Proposition 4.2
  • Theorem 5.1
  • Definition 4.1: Subgraph multilinear extension (SubMT)
  • Definition 4.2: $\epsilon$-SubMT approximation
  • Definition 4.3: $(\delta,\epsilon)$-counterfactual fidelity
  • Proposition 4.4
  • ...and 5 more