Graph Counterfactual Explainable AI via Latent Space Traversal

Andreas Abildtrup Hansen; Paraskevas Pegios; Anna Calissano; Aasa Feragen

Graph Counterfactual Explainable AI via Latent Space Traversal

Andreas Abildtrup Hansen, Paraskevas Pegios, Anna Calissano, Aasa Feragen

TL;DR

This work addresses explainability for graph classifiers by producing counterfactual explanations through latent-space traversal in a permutation-equivariant graph VAE (PEGVAE). Counterfactuals are generated by steering the latent code $\bm{z}$ across the classifier decision boundary using gradient updates $\bm{z}_{i+1} = \bm{z}_i - \epsilon \nabla \mathcal{L}(\bm{z}_i, y_D)$, with reconstruction via a decoder $\mathcal{D}$ and a classifier $\mathcal{C}$. The approach obviates the need for explicit graph-distance metrics and yields a flexible, in-distribution generation of counterfactual graphs $G^{CF} = \mathcal{D}(\bm{z}^{CF})$. Empirical results on three molecular graph datasets show that classifier-guided counterfactuals achieve robust trade-offs between identity preservation and validity and outperform several baselines.

Abstract

Explaining the predictions of a deep neural network is a nontrivial task, yet high-quality explanations for predictions are often a prerequisite for practitioners to trust these models. Counterfactual explanations aim to explain predictions by finding the ''nearest'' in-distribution alternative input whose prediction changes in a pre-specified way. However, it remains an open question how to define this nearest alternative input, whose solution depends on both the domain (e.g. images, graphs, tabular data, etc.) and the specific application considered. For graphs, this problem is complicated i) by their discrete nature, as opposed to the continuous nature of state-of-the-art graph classifiers; and ii) by the node permutation group acting on the graphs. We propose a method to generate counterfactual explanations for any differentiable black-box graph classifier, utilizing a case-specific permutation equivariant graph variational autoencoder. We generate counterfactual explanations in a continuous fashion by traversing the latent space of the autoencoder across the classification boundary of the classifier, allowing for seamless integration of discrete graph structure and continuous graph attributes. We empirically validate the approach on three graph datasets, showing that our model is consistently high-performing and more robust than the baselines.

Graph Counterfactual Explainable AI via Latent Space Traversal

TL;DR

across the classifier decision boundary using gradient updates

, with reconstruction via a decoder

and a classifier

. The approach obviates the need for explicit graph-distance metrics and yields a flexible, in-distribution generation of counterfactual graphs

. Empirical results on three molecular graph datasets show that classifier-guided counterfactuals achieve robust trade-offs between identity preservation and validity and outperform several baselines.

Abstract

Paper Structure (31 sections, 8 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 8 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Background
Counterfactual Explanations
Equivariant graph generative models.
Method
Graph Representation
Invariance and Equivariance to Permutation
PEGVAE
Classifier Design
Generating Counterfactuals via Latent Space Traversal
Inferring Graph Reconstruction.
Algorithmic Representation.
Experiments
Data
Evaluation Metrics
...and 16 more sections

Figures (3)

Figure 1: Top: The classifier architecture and the PEGVAE. Bottom: The counterfactual graph generation.
Figure 2: Trade-off between metrics for Identity Preservation and Validity.
Figure A.1: Illustration of how generating counterfactual explanations based on the nearest neighbor can produce low validity.

Graph Counterfactual Explainable AI via Latent Space Traversal

TL;DR

Abstract

Graph Counterfactual Explainable AI via Latent Space Traversal

Authors

TL;DR

Abstract

Table of Contents

Figures (3)