Quantifying the Intrinsic Usefulness of Attributional Explanations for Graph Neural Networks with Artificial Simulatability Studies

Jonas Teufel; Luca Torresi; Pascal Friederich

Quantifying the Intrinsic Usefulness of Attributional Explanations for Graph Neural Networks with Artificial Simulatability Studies

Jonas Teufel, Luca Torresi, Pascal Friederich

TL;DR

The paper addresses the challenge of evaluating attributional explanations for graph neural networks without costly human studies by introducing a formal student–teacher framework and the student–teacher simulatability (STS) metric. It demonstrates that explanation supervision can improve main-task performance, especially under harder data regimes or with high-quality explanations, using extensive ablations on synthetic data and validation on real-world molecular datasets. The findings show robustness to moderate explanation noise and reveal that adversarial explanations or overly strong noise can negate benefits, highlighting the nuanced role explanations play in learning. By positioning artificial simulatability as a complementary dimension to faithfulness, the work offers a practical, scalable method for assessing explanation usefulness in graph domains and suggests avenues for extending to other explanation types and educational contexts.

Abstract

Despite the increasing relevance of explainable AI, assessing the quality of explanations remains a challenging issue. Due to the high costs associated with human-subject experiments, various proxy metrics are often used to approximately quantify explanation quality. Generally, one possible interpretation of the quality of an explanation is its inherent value for teaching a related concept to a student. In this work, we extend artificial simulatability studies to the domain of graph neural networks. Instead of costly human trials, we use explanation-supervisable graph neural networks to perform simulatability studies to quantify the inherent usefulness of attributional graph explanations. We perform an extensive ablation study to investigate the conditions under which the proposed analyses are most meaningful. We additionally validate our methods applicability on real-world graph classification and regression datasets. We find that relevant explanations can significantly boost the sample efficiency of graph neural networks and analyze the robustness towards noise and bias in the explanations. We believe that the notion of usefulness obtained from our proposed simulatability analysis provides a dimension of explanation quality that is largely orthogonal to the common practice of faithfulness and has great potential to expand the toolbox of explanation quality assessments, specifically for graph explanations.

Quantifying the Intrinsic Usefulness of Attributional Explanations for Graph Neural Networks with Artificial Simulatability Studies

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 6 figures, 4 tables)

This paper contains 19 sections, 1 equation, 6 figures, 4 tables.

Introduction
Related Work
Simulatability Studies.
Explanation Supervision for GNNs
Student-Teacher Analysis of Explanation Quality
Computational Experiments
Ablation Study for a Synthetic Graph Classification Dataset
Student Model Implementations.
Training Dataset Size Sweep.
Explanation Noise Sweep.
Adversarial Explanation Sweep.
Student Network Layer Structure.
Node versus Edge Explanations.
Real-World Datasets
Mutagenicity - Graph Classification
...and 4 more sections

Figures (6)

Figure 1: Illustration of the student teacher training workflow as well as the setting of our artificial simulatability study.
Figure 2: Synthetic dataset used to quantify the usefulness of attributional graph explanations, incl. testing the robustness toward adversarial explanations.
Figure 3: Results of student-teacher analyses ($R=25$) for different training dataset sizes. Each column shows the performance distribution for the reference student ( blue) and the explanation student ( green) of the student-teacher procedure. The number above each column is the resulting $\operatorname{STS}$ value. $\text{(*)}$ indicates statistical significance according to a paired T-test with $p<5\%$
Figure 4: Results of student-teacher analyses ($R=25$) for explanations with different ratios of additional explanation noise. Each column shows the performance distribution for the reference student ( blue) and the explanation student ( green) of the student-teacher procedure. The number above each column is the resulting $\operatorname{STS}$ value. $\text{(*)}$ indicates statistical significance according to a paired T-test with $p<5\%$
Figure 5: Results of student-teacher analyses ($R=25$) for datasets containing different amounts of adversarial incorrect explanations. Each column shows the performance distribution for the reference student ( blue) and the explanation student ( green) of the student-teacher procedure. The number above each column is the resulting $\operatorname{STS}$ value. $\text{(*)}$ indicates statistical significance according to a paired T-test with $p<5\%$
...and 1 more figures

Quantifying the Intrinsic Usefulness of Attributional Explanations for Graph Neural Networks with Artificial Simulatability Studies

TL;DR

Abstract

Quantifying the Intrinsic Usefulness of Attributional Explanations for Graph Neural Networks with Artificial Simulatability Studies

Authors

TL;DR

Abstract

Table of Contents

Figures (6)