Graph Diffusion Counterfactual Explanation
David Bechtoldt, Sidney Bender
TL;DR
This work tackles explainability for graph-structured predictions by introducing Graph Diffusion Counterfactual Explanation (GDCE), a diffusion-based generator that creates counterfactual graphs on $G=(X,E)$. The method uses discrete diffusion with classifier-free guidance to perturb a graph at an intermediate state $G_\tau$ and steer the reverse process toward a target $y_1$, yielding $G_{CF}$ that flips the prediction while staying on the data manifold. It is demonstrated on planar graphs and a large molecular dataset (ZINC-250k), showing high validity and accuracy under a tunable trade-off governed by the diffusion step $\tau$, with greater edits achievable at the cost of similarity. The diffusion prior helps navigate the balance between preserving structure and achieving target properties, enabling scalable, domain-valid counterfactuals for graph domains such as drug design and network analysis.
Abstract
Machine learning models that operate on graph-structured data, such as molecular graphs or social networks, often make accurate predictions but offer little insight into why certain predictions are made. Counterfactual explanations address this challenge by seeking the closest alternative scenario where the model's prediction would change. Although counterfactual explanations are extensively studied in tabular data and computer vision, the graph domain remains comparatively underexplored. Constructing graph counterfactuals is intrinsically difficult because graphs are discrete and non-euclidean objects. We introduce Graph Diffusion Counterfactual Explanation, a novel framework for generating counterfactual explanations on graph data, combining discrete diffusion models and classifier-free guidance. We empirically demonstrate that our method reliably generates in-distribution as well as minimally structurally different counterfactuals for both discrete classification targets and continuous properties.
