Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability
Salvatore Contino, Paolo Sortino, Maria Rita Gulotta, Ugo Perricone, Roberto Pirrone
TL;DR
The paper tackles the interpretability challenge in GNN-based virtual screening by introducing Hierarchical Grad-CAM graph Explainer (HGE), which provides atom-, ring-, and molecule-level explanations for activity predictions across 20 kinase targets. Twenty GCNNs trained on two kinome-focused datasets (EMBER and ChEMBL_over) achieve state-of-the-art sensitivity and strong enrichment in top-prediction metrics, while HGE reveals consistent moieties responsible for binding and supports drug repurposing insights. Validation against DrugBank inhibitors and comparisons with GNNExplainer demonstrate that HGE offers richer, chemistry-grounded explanations that align with pharmacophoric features mapped by RDKit. Overall, HGE enhances both predictive performance and mechanistic interpretability, enabling more informed structure optimization and faster hit discovery in drug design pipelines.
Abstract
Background: Virtual Screening (VS) has become an essential tool in drug discovery, enabling the rapid and cost-effective identification of potential bioactive molecules. Among recent advancements, Graph Neural Networks (GNNs) have gained prominence for their ability to model complex molecular structures using graph-based representations. However, the integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge. This limitation hampers both the interpretability of predictive models and the rational design of novel therapeutics. Results: We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family. These classifiers achieved state-of-the-art performance in virtual screening tasks, demonstrating high accuracy and robustness on different targets. Building upon these models, we implemented the Hierarchical Grad-CAM graph Explainer (HGE) framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization. HGE exploits Grad-CAM explanations at the atom, ring, and whole-molecule levels, leveraging the message-passing mechanism to highlight the most relevant chemical moieties. Validation against experimental data from the literature confirmed the ability of the explainer to recognize a molecular pattern of drugs and correctly annotate them to the known target. Conclusion: Our approach may represent a valid support to shorten both the screening and the hit discovery process. Detailed knowledge of the molecular substructures that play a role in the binding process can help the computational chemist to gain insights into the structure optimization, as well as in drug repurposing tasks.
