Table of Contents
Fetching ...

Faithful and Accurate Self-Attention Attribution for Message Passing Neural Networks via the Computation Tree Viewpoint

Yong-Min Shin, Siqing Li, Xin Cao, Won-Yong Shin

TL;DR

This work addresses the gap between the widespread use of self-attention in Att-GNNs and the reliability of attention as explanations. It introduces GAtt, a computation-tree–based edge attribution method, to extract faithful edge contributions from attention weights in Att-GNNs. By aligning edge attributions with the feed-forward computation tree and enforcing two principles—proximity to the target node and adjustment by path position—GAtt provides a simple, deterministic, and hyperparameter-free way to quantify edge importance. Empirical results demonstrate that GAtt substantially improves faithfulness and explanation accuracy over naive attention-averaging and compares favorably with several post-hoc explainers across real-world and synthetic datasets, highlighting the potential of attention-based explanations when properly interpreted. The method is model-agnostic within Att-GNNs and offers efficient matrix-based computation, enabling scalable explanations for large graphs, with code available for reproducibility."

Abstract

The self-attention mechanism has been adopted in various popular message passing neural networks (MPNNs), enabling the model to adaptively control the amount of information that flows along the edges of the underlying graph. Such attention-based MPNNs (Att-GNNs) have also been used as a baseline for multiple studies on explainable AI (XAI) since attention has steadily been seen as natural model interpretations, while being a viewpoint that has already been popularized in other domains (e.g., natural language processing and computer vision). However, existing studies often use naive calculations to derive attribution scores from attention, undermining the potential of attention as interpretations for Att-GNNs. In our study, we aim to fill the gap between the widespread usage of Att-GNNs and their potential explainability via attention. To this end, we propose GATT, edge attribution calculation method for self-attention MPNNs based on the computation tree, a rooted tree that reflects the computation process of the underlying model. Despite its simplicity, we empirically demonstrate the effectiveness of GATT in three aspects of model explanation: faithfulness, explanation accuracy, and case studies by using both synthetic and real-world benchmark datasets. In all cases, the results demonstrate that GATT greatly improves edge attribution scores, especially compared to the previous naive approach. Our code is available at https://github.com/jordan7186/GAtt.

Faithful and Accurate Self-Attention Attribution for Message Passing Neural Networks via the Computation Tree Viewpoint

TL;DR

This work addresses the gap between the widespread use of self-attention in Att-GNNs and the reliability of attention as explanations. It introduces GAtt, a computation-tree–based edge attribution method, to extract faithful edge contributions from attention weights in Att-GNNs. By aligning edge attributions with the feed-forward computation tree and enforcing two principles—proximity to the target node and adjustment by path position—GAtt provides a simple, deterministic, and hyperparameter-free way to quantify edge importance. Empirical results demonstrate that GAtt substantially improves faithfulness and explanation accuracy over naive attention-averaging and compares favorably with several post-hoc explainers across real-world and synthetic datasets, highlighting the potential of attention-based explanations when properly interpreted. The method is model-agnostic within Att-GNNs and offers efficient matrix-based computation, enabling scalable explanations for large graphs, with code available for reproducibility."

Abstract

The self-attention mechanism has been adopted in various popular message passing neural networks (MPNNs), enabling the model to adaptively control the amount of information that flows along the edges of the underlying graph. Such attention-based MPNNs (Att-GNNs) have also been used as a baseline for multiple studies on explainable AI (XAI) since attention has steadily been seen as natural model interpretations, while being a viewpoint that has already been popularized in other domains (e.g., natural language processing and computer vision). However, existing studies often use naive calculations to derive attribution scores from attention, undermining the potential of attention as interpretations for Att-GNNs. In our study, we aim to fill the gap between the widespread usage of Att-GNNs and their potential explainability via attention. To this end, we propose GATT, edge attribution calculation method for self-attention MPNNs based on the computation tree, a rooted tree that reflects the computation process of the underlying model. Despite its simplicity, we empirically demonstrate the effectiveness of GATT in three aspects of model explanation: faithfulness, explanation accuracy, and case studies by using both synthetic and real-world benchmark datasets. In all cases, the results demonstrate that GATT greatly improves edge attribution scores, especially compared to the previous naive approach. Our code is available at https://github.com/jordan7186/GAtt.
Paper Structure (57 sections, 1 theorem, 16 equations, 14 figures, 17 tables, 1 algorithm)

This paper contains 57 sections, 1 theorem, 16 equations, 14 figures, 17 tables, 1 algorithm.

Key Result

Proposition 2.5

For a given set of attention weights $\mathcal{A} = \{{\bf A}(l)\}_{l=1}^{L}$ for an $L$-layer Att-GNN with $L\geq1$, GAtt in Definition def:edgeattributiondefinition is equivalent to

Figures (14)

  • Figure 1: An visualization of our method (GAtt, right) against the previous approach (AvgAtt, left) on the Infection dataset, where the correct infection path is highlighted as the blue nodes.
  • Figure 2: A visualization for a 2-layer Att-GNN on target node 27 on the infection dataset. Figure \ref{['subfig:infectiondatasetexample']} shows the local 2-hop subgraph with the edge $e_{40, 27}$ marked as red. Figure \ref{['subfig:infectiondatasetcomputationgraph']} shows the computation tree in the Att-GNN, where the information flows from leaf nodes to node 27 at the root. The edges are colored by the attention weights from the model, while highlighting the two occurrences of edge $e_{40, 27}$.
  • Figure 3: Case study on the BA-Shapes and Infection datasets for a 2-layer GAT.
  • Figure 4: Runtime comparison for the Infection dataset for calculating GAtt using three different calculation strategies: 1) Straightforward computation via constructing a rooted subtree (denoted Subtree), 2) matrix-based computation using Eq. (\ref{['eq:gattmatrixcomputation']}) in the main manuscript for each node, and 3) batch computation using Algorithm \ref{['algo:batchcomputation']}. The text indicates the relative speedups with respect to the original Subtree strategy.
  • Figure 5: A visualization of the computation trees (without color-coding the attention values) for GAT (Figure \ref{['subfig:computationgraphGAT']}) and graph transformer (Figure \ref{['subfig:computationgraphTransformer']}) for a given graph (Figure \ref{['subfig:examplegraph']}). The target node 0 is colored in yellow.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Definition 2.1: Computation tree
  • Definition 2.2: Flow in a computation tree
  • Definition 2.3: Attention flow in a computation tree
  • Example 1
  • Definition 2.4: GAtt
  • Example 2
  • Proposition 2.5