Table of Contents
Fetching ...

Discrete Diffusion-Based Model-Level Explanation of Heterogeneous GNNs with Node Features

Pallabee Das, Stefan Heindorf

TL;DR

DiGNNExplainer introduces a model-level explanation framework for heterogeneous GNNs that generates explanation graphs with authentic node features using discrete diffusion. It couples DiGress for graph structure with a novel DiTabDDPM for discrete node features, enforcing metagraph consistency and selecting a top explanation per class based on GNN predictions. Across real and synthetic datasets, the method achieves superior realism (via distributional similarity) and faithfulness (PF and GF) compared to state-of-the-art baselines, demonstrating the value of incorporating actual node features in explanations. The approach is scalable to varied graph sizes, extensible to directed and broader heterogeneous domains, and offers practical insights for understanding complex HGNN decision-making.

Abstract

Many real-world datasets, such as citation networks, social networks, and molecular structures, are naturally represented as heterogeneous graphs, where nodes belong to different types and have additional features. For example, in a citation network, nodes representing "Paper" or "Author" may include attributes like keywords or affiliations. A critical machine learning task on these graphs is node classification, which is useful for applications such as fake news detection, corporate risk assessment, and molecular property prediction. Although Heterogeneous Graph Neural Networks (HGNNs) perform well in these contexts, their predictions remain opaque. Existing post-hoc explanation methods lack support for actual node features beyond one-hot encoding of node type and often fail to generate realistic, faithful explanations. To address these gaps, we propose DiGNNExplainer, a model-level explanation approach that synthesizes heterogeneous graphs with realistic node features via discrete denoising diffusion. In particular, we generate realistic discrete features (e.g., bag-of-words features) using diffusion models within a discrete space, whereas previous approaches are limited to continuous spaces. We evaluate our approach on multiple datasets and show that DiGNNExplainer produces explanations that are realistic and faithful to the model's decision-making, outperforming state-of-the-art methods.

Discrete Diffusion-Based Model-Level Explanation of Heterogeneous GNNs with Node Features

TL;DR

DiGNNExplainer introduces a model-level explanation framework for heterogeneous GNNs that generates explanation graphs with authentic node features using discrete diffusion. It couples DiGress for graph structure with a novel DiTabDDPM for discrete node features, enforcing metagraph consistency and selecting a top explanation per class based on GNN predictions. Across real and synthetic datasets, the method achieves superior realism (via distributional similarity) and faithfulness (PF and GF) compared to state-of-the-art baselines, demonstrating the value of incorporating actual node features in explanations. The approach is scalable to varied graph sizes, extensible to directed and broader heterogeneous domains, and offers practical insights for understanding complex HGNN decision-making.

Abstract

Many real-world datasets, such as citation networks, social networks, and molecular structures, are naturally represented as heterogeneous graphs, where nodes belong to different types and have additional features. For example, in a citation network, nodes representing "Paper" or "Author" may include attributes like keywords or affiliations. A critical machine learning task on these graphs is node classification, which is useful for applications such as fake news detection, corporate risk assessment, and molecular property prediction. Although Heterogeneous Graph Neural Networks (HGNNs) perform well in these contexts, their predictions remain opaque. Existing post-hoc explanation methods lack support for actual node features beyond one-hot encoding of node type and often fail to generate realistic, faithful explanations. To address these gaps, we propose DiGNNExplainer, a model-level explanation approach that synthesizes heterogeneous graphs with realistic node features via discrete denoising diffusion. In particular, we generate realistic discrete features (e.g., bag-of-words features) using diffusion models within a discrete space, whereas previous approaches are limited to continuous spaces. We evaluate our approach on multiple datasets and show that DiGNNExplainer produces explanations that are realistic and faithful to the model's decision-making, outperforming state-of-the-art methods.

Paper Structure

This paper contains 50 sections, 9 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Overview of DiGNNExplainer. The example, based on the DBLP dataset, illustrates the process of generating explanations for a node classification task. Nodes represent different entity types: authors (red, $\bullet$), papers (blue, $\bullet$), conferences (orange, $\bullet$), and terms (green, $\bullet$). Node features, such as discrete bag-of-words representations of author keywords (e.g., [010100]), are encoded in red. The task is to predict the class of each author, i.e., their research area: database, data mining, artificial intelligence, or information retrieval. (1) & (2) The graph generator takes as input small real-world graphs with discrete node features and produces synthetic graphs with discrete node features. (3) The validity of these generated graphs is checked against the metagraph. (4) A trained GNN, which is to be explained, is applied to each of the nine author nodes of valid, synthesized graphs, producing a prediction for each of the four classes. For each class, we identify the node and corresponding graph with the highest prediction probability, yielding one explanation graph per class. (5) To evaluate our approach, we compute various metrics, including the ground-truth faithfulness between our explanation graphs and the communities within the real dataset.
  • Figure 2: Explanation graphs for each class of author nodes of the DBLP dataset that maximizes the prediction of the class. Node colors indicate types: paper (blue), author (red), term (dark green), and conference (orange).
  • Figure 3: Frequency distribution plots visualizing the synthetic bag-of-words node feature vector of the author node of the explanation graph that is classified with maximum probability for a class.
  • Figure 4: Density plots of the continuous term-node feature values in the explanation graphs. For each class, we show the feature-value distribution of the term nodes in the explanation graph that maximizes the predicted probability for that class.
  • Figure 5: Diffusion steps for generation of synthetic graph and discrete node features.