Table of Contents
Fetching ...

Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction

Trung Nguyen, Md Masud Rana, Farjana Tasnim Mukta, Chang-Guo Zhan, Duc Duy Nguyen

TL;DR

<3-5 sentence high-level summary> The paper tackles BBBP prediction by addressing the limitations of topology-only GNNs that overlook three-dimensional geometry. It introduces GMC-MPNN, a geometry-aware graph neural network that uses weighted colored subgraphs to encode atom-type–specific spatial interactions and long-range effects, integrated with conventional atomic features. Evaluated on three BBBP benchmarks with scaffold-based splits, GMC-MPNN achieves state-of-the-art AUC-ROC and strong regression metrics, demonstrating robust generalization to diverse chemical scaffolds. An ablation study confirms that both common and rare atom-pair motifs contribute meaningfully to predictions, underscoring the value of geometry-informed representations in drug discovery pipelines.

Abstract

Accurate prediction of blood-brain barrier permeability (BBBP) is essential for central nervous system (CNS) drug development. While graph neural networks (GNNs) have advanced molecular property prediction, they often rely on molecular topology and neglect the three-dimensional geometric information crucial for modeling transport mechanisms. This paper introduces the geometric multi-color message-passing graph neural network (GMC-MPNN), a novel framework that enhances standard message-passing architectures by explicitly incorporating atomic-level geometric features and long-range interactions. Our model constructs weighted colored subgraphs based on atom types to capture the spatial relationships and chemical context that govern BBB permeability. We evaluated GMC-MPNN on three benchmark datasets for both classification and regression tasks, using rigorous scaffold-based splitting to ensure a robust assessment of generalization. The results demonstrate that GMC-MPNN consistently outperforms existing state-of-the-art models, achieving superior performance in both classifying compounds as permeable/non-permeable (AUC-ROC of 0.947 and 0.9212) and in regressing continuous permeability values (RMSE of 0.5628, Pearson correlation of 0.6947). An ablation study further quantified the impact of specific atom-pair interactions, revealing that the model's predictive power derives from its ability to learn from both common and rare, but chemically significant, functional motifs. By integrating spatial geometry into the graph representation, GMC-MPNN sets a new performance benchmark and offers a more accurate and generalizable tool for drug discovery pipelines.

Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction

TL;DR

<3-5 sentence high-level summary> The paper tackles BBBP prediction by addressing the limitations of topology-only GNNs that overlook three-dimensional geometry. It introduces GMC-MPNN, a geometry-aware graph neural network that uses weighted colored subgraphs to encode atom-type–specific spatial interactions and long-range effects, integrated with conventional atomic features. Evaluated on three BBBP benchmarks with scaffold-based splits, GMC-MPNN achieves state-of-the-art AUC-ROC and strong regression metrics, demonstrating robust generalization to diverse chemical scaffolds. An ablation study confirms that both common and rare atom-pair motifs contribute meaningfully to predictions, underscoring the value of geometry-informed representations in drug discovery pipelines.

Abstract

Accurate prediction of blood-brain barrier permeability (BBBP) is essential for central nervous system (CNS) drug development. While graph neural networks (GNNs) have advanced molecular property prediction, they often rely on molecular topology and neglect the three-dimensional geometric information crucial for modeling transport mechanisms. This paper introduces the geometric multi-color message-passing graph neural network (GMC-MPNN), a novel framework that enhances standard message-passing architectures by explicitly incorporating atomic-level geometric features and long-range interactions. Our model constructs weighted colored subgraphs based on atom types to capture the spatial relationships and chemical context that govern BBB permeability. We evaluated GMC-MPNN on three benchmark datasets for both classification and regression tasks, using rigorous scaffold-based splitting to ensure a robust assessment of generalization. The results demonstrate that GMC-MPNN consistently outperforms existing state-of-the-art models, achieving superior performance in both classifying compounds as permeable/non-permeable (AUC-ROC of 0.947 and 0.9212) and in regressing continuous permeability values (RMSE of 0.5628, Pearson correlation of 0.6947). An ablation study further quantified the impact of specific atom-pair interactions, revealing that the model's predictive power derives from its ability to learn from both common and rare, but chemically significant, functional motifs. By integrating spatial geometry into the graph representation, GMC-MPNN sets a new performance benchmark and offers a more accurate and generalizable tool for drug discovery pipelines.

Paper Structure

This paper contains 12 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: An illustration of the construction of weighted adjacency and weighted Laplacian matrices from the weighted colored subgraphs of a molecule. In the top row, an example molecule, 8-chlorotheophylline ($\text{C}_7\text{H}_7\text{Cl}\text{N}_4\text{O}_2$; CHEBI:59771), the colored graph structure of the 8-chlorotheophylline, and three example colored subgraphs of $G_{N-O}, G_{Cl-O}, G_{N-Cl}$. In the bottom row, generated weighted adjacency matrix (A) and weighted Laplacian matrix (L) from a example subgraph $G_{N-O}$.
  • Figure 2: An illustration of GMC-MPNN atom-level fusion graph model. 1) Construct geometric graph learning atom features by considering statistical information (sum, mean, median, etc.) about the rigidity of the molecular graphs. 2) Converts molecular SMILES string to molecular graph using RDKit, and integrates with GGL features as new atom-level features. 3) Pass these combined features through a message-passing neural network to update all feature vectors, followed by an aggregation function and a feed-forward neural network for property prediction.
  • Figure 3: Performance comparison of different models on the $BBBP_{\text{cls}}^{\text{MolNet}}$ dataset. The red bars highlight our GMC-MPNN model.
  • Figure 4: Performance comparison of different models on $BBBP_{\text{cls}}^{\text{B3DB}}$ dataset, evaluated using scaffold-balanced splits.
  • Figure 5: Performance comparison of different models on the $\text{BBBP}_{\text{reg}}^{\text{B3DB}}$ dataset with RMSE and Pearson Correlation scores. The red bars showcase the performances of our geometric graph learning-based model, GMC-MPNN. We compare against several baseline and state-of-the-art models: GCN kipf2016semi, NF nf, D-MPNN heid2023chemprop, GIN genova2017graph, GAT velickovic2017graph, Weave kearnes2016molecular, AttentiveFP xiong2019pushing, CoMPT compt, CD-MVGNN cdmvgnn, GSL-MPP gslmpp, and MPNN gilmer2017neural.
  • ...and 1 more figures