Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability

Salvatore Contino; Paolo Sortino; Maria Rita Gulotta; Ugo Perricone; Roberto Pirrone

Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability

Salvatore Contino, Paolo Sortino, Maria Rita Gulotta, Ugo Perricone, Roberto Pirrone

TL;DR

The paper tackles the interpretability challenge in GNN-based virtual screening by introducing Hierarchical Grad-CAM graph Explainer (HGE), which provides atom-, ring-, and molecule-level explanations for activity predictions across 20 kinase targets. Twenty GCNNs trained on two kinome-focused datasets (EMBER and ChEMBL_over) achieve state-of-the-art sensitivity and strong enrichment in top-prediction metrics, while HGE reveals consistent moieties responsible for binding and supports drug repurposing insights. Validation against DrugBank inhibitors and comparisons with GNNExplainer demonstrate that HGE offers richer, chemistry-grounded explanations that align with pharmacophoric features mapped by RDKit. Overall, HGE enhances both predictive performance and mechanistic interpretability, enabling more informed structure optimization and faster hit discovery in drug design pipelines.

Abstract

Background: Virtual Screening (VS) has become an essential tool in drug discovery, enabling the rapid and cost-effective identification of potential bioactive molecules. Among recent advancements, Graph Neural Networks (GNNs) have gained prominence for their ability to model complex molecular structures using graph-based representations. However, the integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge. This limitation hampers both the interpretability of predictive models and the rational design of novel therapeutics. Results: We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family. These classifiers achieved state-of-the-art performance in virtual screening tasks, demonstrating high accuracy and robustness on different targets. Building upon these models, we implemented the Hierarchical Grad-CAM graph Explainer (HGE) framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization. HGE exploits Grad-CAM explanations at the atom, ring, and whole-molecule levels, leveraging the message-passing mechanism to highlight the most relevant chemical moieties. Validation against experimental data from the literature confirmed the ability of the explainer to recognize a molecular pattern of drugs and correctly annotate them to the known target. Conclusion: Our approach may represent a valid support to shorten both the screening and the hit discovery process. Detailed knowledge of the molecular substructures that play a role in the binding process can help the computational chemist to gain insights into the structure optimization, as well as in drug repurposing tasks.

Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability

TL;DR

Abstract

Paper Structure (17 sections, 7 equations, 6 figures, 9 tables)

This paper contains 17 sections, 7 equations, 6 figures, 9 tables.

Background
Methods
Data preparation
Data set 1: EMBER
Data set 2: ChEMBL_over
Graph Convolutional Neural Network
Explainability analysis
Results
Metrics
Results on EMBER data set
Results on ChEMBL_over data set
Explainability results
Discussion
Conclusion
Data and Code Availability
...and 2 more sections

Figures (6)

Figure 1: Heatmap of the similarity matrix obtained between the oversampled and ChEMBL molecules.
Figure 2: The GCNN classifier architecture along with the HGE explainers
Figure 3: Predictions obtained with HGE for Apigenin on CDK6 and CK2A1. H-bond acceptor or donor groups are highlighted in yellow, hydrophobic groups are colored in blue, and the aromatic groups are green.
Figure 4: Predictions obtained with HGE for Trilaciclib on CDK2 and CDK6. H-bond acceptor or donor groups are highlighted in yellow, hydrophobic groups are colored in blue, and the aromatic groups are green.
Figure 5: Chemical features identified by HGE in a) Apigenin and Chrysin when assigned to CDK6 protein and b) Chemical features identified by HGE in Pacritinib and SB1578 when assigned to JAK2 protein. H-bond acceptor or donor groups are highlighted in yellow, hydrophobic groups are colored in blue, and the aromatic groups are green.
...and 1 more figures

Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability

TL;DR

Abstract

Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability

Authors

TL;DR

Abstract

Table of Contents

Figures (6)