Improving the interpretability of GNN predictions through conformal-based graph sparsification
Pablo Sanchez-Martin, Kinaan Aamir Khan, Isabel Valera
TL;DR
CORES tackles the interpretability gap in GNNs by learning a predictive subgraph $\\ackslashmathcal{G}_s$ during training through node/edge removal without assuming subgraph structure. It combines reinforcement learning (policy gradient with PPO) and conformal prediction to guide sparsification with a two-part reward $R = \lambda R_p + (1-\lambda) R_s$ and a conformal-based uncertainty mechanism. The approach solves a bi-level optimization where the inner loop optimizes graph-classification performance on $\\(\\mathcal{G}_s$ while the outer loop learns the sparsification policy, yielding sparser, more interpretable predictions while maintaining competitive accuracy across nine graph datasets. Empirically, CORESN and CORESE achieve significantly sparser predictive subgraphs compared to baselines, with competitive or superior performance, and demonstrate qualitative subgraph motifs aligned with domain knowledge, albeit at the cost of higher training time due to RL components.
Abstract
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that jointly i) finds the most predictive subgraph by removing edges and/or nodes -- -\emph{without making assumptions about the subgraph structure} -- while ii) optimizing the performance of the graph classification task. To that end, we rely on reinforcement learning to solve the resulting bi-level optimization with a reward function based on conformal predictions to account for the current in-training uncertainty of the classifier. Our empirical results on nine different graph classification datasets show that our method competes in performance with baselines while relying on significantly sparser subgraphs, leading to more interpretable GNN-based predictions.
