L2XGNN: Learning to Explain Graph Neural Networks
Giuseppe Serra, Mathias Niepert
TL;DR
L2XGNN addresses the faithfulness gap in GNN explanations by integrating the learning-to-explain paradigm directly into standard GNNs. It learns to sample explanatory subgraphs, using an upstream edge-weight model and a constrained optimization sampler (via perturb-and-MAP) to produce a subgraph that is then used exclusively in message passing. The framework employs implicit maximum-likelihood learning to backpropagate through the discrete sampler, and demonstrates competitive graph-classification accuracy while providing faithful explanations that align with ground-truth motifs and facilitate debugging of shortcut learning. This approach enables motif-based interpretability without sacrificing predictive performance, with broad applicability to graph domains where interpretable structure is crucial.
Abstract
Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose L2XGNN, a framework for explainable GNNs which provides faithful explanations by design. L2XGNN learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. L2XGNN is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that L2XGNN achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that L2XGNN is able to identify motifs responsible for the graph's properties it is intended to predict.
