Table of Contents
Fetching ...

Globally Interpretable Graph Learning via Distribution Matching

Yi Nian, Yurui Chang, Wei Jin, Lu Lin

TL;DR

This work tackles global interpretability for graph neural networks by shifting focus to the training procedure itself rather than individual predictions. It introduces Graph Distribution Matching (GDM), which synthesizes interpretable graphs for each class by matching their embeddings to the model's training data in a trajectory-aware manner using the maximum mean discrepancy across model snapshots. The proposed objective combines distribution alignment with feature- and sparsity-regularization, enabling differentiable optimization of interpretive graphs. Empirical results show that interpretive graphs produced by GDM yield high model fidelity and predictive utility, while also offering human-intelligible patterns and significant efficiency gains over prior global-interpretation methods. The framework provides a practical, plug-and-play tool for developers to publish more transparent graph models without compromising training data confidentiality.

Abstract

Graph neural networks (GNNs) have emerged as a powerful model to capture critical graph patterns. Instead of treating them as black boxes in an end-to-end fashion, attempts are arising to explain the model behavior. Existing works mainly focus on local interpretation to reveal the discriminative pattern for each individual instance, which however cannot directly reflect the high-level model behavior across instances. To gain global insights, we aim to answer an important question that is not yet well studied: how to provide a global interpretation for the graph learning procedure? We formulate this problem as globally interpretable graph learning, which targets on distilling high-level and human-intelligible patterns that dominate the learning procedure, such that training on this pattern can recover a similar model. As a start, we propose a novel model fidelity metric, tailored for evaluating the fidelity of the resulting model trained on interpretations. Our preliminary analysis shows that interpretative patterns generated by existing global methods fail to recover the model training procedure. Thus, we further propose our solution, Graph Distribution Matching (GDM), which synthesizes interpretive graphs by matching the distribution of the original and interpretive graphs in the GNN's feature space as its training proceeds, thus capturing the most informative patterns the model learns during training. Extensive experiments on graph classification datasets demonstrate multiple advantages of the proposed method, including high model fidelity, predictive accuracy and time efficiency, as well as the ability to reveal class-relevant structure.

Globally Interpretable Graph Learning via Distribution Matching

TL;DR

This work tackles global interpretability for graph neural networks by shifting focus to the training procedure itself rather than individual predictions. It introduces Graph Distribution Matching (GDM), which synthesizes interpretable graphs for each class by matching their embeddings to the model's training data in a trajectory-aware manner using the maximum mean discrepancy across model snapshots. The proposed objective combines distribution alignment with feature- and sparsity-regularization, enabling differentiable optimization of interpretive graphs. Empirical results show that interpretive graphs produced by GDM yield high model fidelity and predictive utility, while also offering human-intelligible patterns and significant efficiency gains over prior global-interpretation methods. The framework provides a practical, plug-and-play tool for developers to publish more transparent graph models without compromising training data confidentiality.

Abstract

Graph neural networks (GNNs) have emerged as a powerful model to capture critical graph patterns. Instead of treating them as black boxes in an end-to-end fashion, attempts are arising to explain the model behavior. Existing works mainly focus on local interpretation to reveal the discriminative pattern for each individual instance, which however cannot directly reflect the high-level model behavior across instances. To gain global insights, we aim to answer an important question that is not yet well studied: how to provide a global interpretation for the graph learning procedure? We formulate this problem as globally interpretable graph learning, which targets on distilling high-level and human-intelligible patterns that dominate the learning procedure, such that training on this pattern can recover a similar model. As a start, we propose a novel model fidelity metric, tailored for evaluating the fidelity of the resulting model trained on interpretations. Our preliminary analysis shows that interpretative patterns generated by existing global methods fail to recover the model training procedure. Thus, we further propose our solution, Graph Distribution Matching (GDM), which synthesizes interpretive graphs by matching the distribution of the original and interpretive graphs in the GNN's feature space as its training proceeds, thus capturing the most informative patterns the model learns during training. Extensive experiments on graph classification datasets demonstrate multiple advantages of the proposed method, including high model fidelity, predictive accuracy and time efficiency, as well as the ability to reveal class-relevant structure.
Paper Structure (26 sections, 12 equations, 3 figures, 13 tables, 1 algorithm)

This paper contains 26 sections, 12 equations, 3 figures, 13 tables, 1 algorithm.

Figures (3)

  • Figure 1: Model Fidelity (i.e., cosine similarity between the predictive logits of original model and that of surrogate model) and Predictive Accuracy (i.e., the original model's accuracy on interpretive graphs) as model training proceeds.
  • Figure 2: Overview of the proposed globally interpretable learning framework via graph distribution matching GDM.
  • Figure 3: Sensitivity analysis of hyper-parameter $\beta$.