B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data
Magdalena Proszewska, Tomasz Danel, Dawid Rymarczyk
TL;DR
B-XAIC introduces a large real-world benchmark for explainable AI in chemistry, pairing 50K molecular graphs with ground-truth atom- and bond-level rationales across seven tasks. It enables direct evaluation of both post-hoc and intrinsically interpretable GNNs by separating null and subgraph explanations and evaluating node- and edge-level fidelity. Across experiments, high predictive accuracy (e.g., for GIN) coexists with inconsistent and sometimes misleading explanations, underscoring limitations of current XAI techniques for molecular graphs. The benchmark provides a rigorous, shareable standard to drive development of faithful, robust explainability methods for graph-based drug discovery and material design.
Abstract
Understanding the reasoning behind deep learning model predictions is crucial in cheminformatics and drug discovery, where molecular design determines their properties. However, current evaluation frameworks for Explainable AI (XAI) in this domain often rely on artificial datasets or simplified tasks, employing data-derived metrics that fail to capture the complexity of real-world scenarios and lack a direct link to explanation faithfulness. To address this, we introduce B-XAIC, a novel benchmark constructed from real-world molecular data and diverse tasks with known ground-truth rationales for assigned labels. Through a comprehensive evaluation using B-XAIC, we reveal limitations of existing XAI methods for Graph Neural Networks (GNNs) in the molecular domain. This benchmark provides a valuable resource for gaining deeper insights into the faithfulness of XAI, facilitating the development of more reliable and interpretable models.
