A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges
Mario Alfonso Prado-Romero, Bardh Prenkaj, Giovanni Stilo, Fosca Giannotti
TL;DR
This survey tackles graph counterfactual explanations (GCE) for Graph Neural Networks by introducing a uniform formalism and a comprehensive taxonomy, enabling consistent comparisons across methods. It consolidates definitions for minimal and multi-class counterfactuals, and documents standardized evaluation protocols and datasets, supplemented by an empirical study via the GRETEL framework. Key contributions include a formal multi-class GCE definition, a ten-dimensional method taxonomy, and a reproducible evaluation platform that highlights strengths and pitfalls across domains such as social, molecular, and -omics graphs. The work also discusses privacy and fairness considerations and maps out open challenges, guiding future research toward model-level explanations, edge-prediction tasks, and robust benchmarks with open competitions. The findings underscore that no single method dominates across all settings; effectiveness is highly domain- and task-dependent, underscoring the need for diverse, reproducible evaluation practices in graph counterfactual explainability.
Abstract
Graph Neural Networks (GNNs) perform well in community detection and molecule classification. Counterfactual Explanations (CE) provide counter-examples to overcome the transparency limitations of black-box models. Due to the growing attention in graph learning, we focus on the concepts of CE for GNNs. We analysed the SoA to provide a taxonomy, a uniform notation, and the benchmarking datasets and evaluation metrics. We discuss fourteen methods, their evaluation protocols, twenty-two datasets, and nineteen metrics. We integrated the majority of methods into the GRETEL library to conduct an empirical evaluation to understand their strengths and pitfalls. We highlight open challenges and future work.
