Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs
Steve Azzolin, Antonio Longa, Stefano Teso, Andrea Passerini
TL;DR
This work tackles the problem of faithfulness in GNN explanations, formalizing faithfulness via sufficiency and necessity and introducing a harmonic-mean Faith score $Faith_{d,p_R,p_C}(R_A)$. It demonstrates that faithfulness metrics are not interchangeable: different perturbation schemes and interventional distributions $p_R$ and $p_C$ yield divergent assessments, underscoring the need to disclose metric parameters. The authors show a no-go result for injective regular GNNs where perfectly faithful explanations are uninformative, but reveal that modular GNNs (SE-GNNs and DI-GNNs) can produce non-trivial strictly faithful explanations under specific architectural strategies, albeit with trade-offs in expressivity and learnability. They also prove that faithfulness is tightly linked to domain invariance, providing a bound on the ID–OOD gap that depends on both invariance and sufficiency, and provide empirical evidence across SE-/DI-GNN benchmarks. The study highlights practical guidelines for evaluating faithfulness and its role in robust domain generalization, supported by open-source code.
Abstract
As Graph Neural Networks (GNNs) become more pervasive, it becomes paramount to build reliable tools for explaining their predictions. A core desideratum is that explanations are \textit{faithful}, \ie that they portray an accurate picture of the GNN's reasoning process. However, a number of different faithfulness metrics exist, begging the question of what is faithfulness exactly and how to achieve it. We make three key contributions. We begin by showing that \textit{existing metrics are not interchangeable} -- \ie explanations attaining high faithfulness according to one metric may be unfaithful according to others -- and can systematically ignore important properties of explanations. We proceed to show that, surprisingly, \textit{optimizing for faithfulness is not always a sensible design goal}. Specifically, we prove that for injective regular GNN architectures, perfectly faithful explanations are completely uninformative. This does not apply to modular GNNs, such as self-explainable and domain-invariant architectures, prompting us to study the relationship between architectural choices and faithfulness. Finally, we show that \textit{faithfulness is tightly linked to out-of-distribution generalization}, in that simply ensuring that a GNN can correctly recognize the domain-invariant subgraph, as prescribed by the literature, does not guarantee that it is invariant unless this subgraph is also faithful.The code is publicly available on GitHub
