Table of Contents
Fetching ...

Knowledge-aware contrastive heterogeneous molecular graph learning

Mukun Chen, Jia Wu, Shirui Pan, Fu Lin, Bo Du, Xiuwen Gong, Wenbin Hu

TL;DR

The paper tackles molecular property and drug–drug interaction (DDI) prediction by moving beyond homogeneous graphs to knowledge-enhanced heterogeneous molecular graphs. It introduces Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML), a tripartite-view framework consisting of Molecular View, Element View, and Drug View, processed by a dual message-passing Heterogeneous Molecular Graph (HMG) encoder and optimized via cross-view contrastive learning. The training objective combines three cross-view losses into a total objective, ${\mathcal L}_{\text{total}} = {\mathcal L}^{M,EM} + {\mathcal L}^{M,DM} + {\mathcal L}^{EM,DM}$, enabling unified pretraining on large unlabeled data such as 250k molecules from ZINC15 and DRKG-derived chemical knowledge. The approach demonstrates superior performance on MoleculeNet property-prediction tasks and on the TwoSide TDI dataset in both transductive and inductive settings, with ablation analyses showing the element view as a critical component and the multiview strategy providing robust generalization to unseen drugs.

Abstract

Molecular representation learning is pivotal in predicting molecular properties and advancing drug design. Traditional methodologies, which predominantly rely on homogeneous graph encoding, are limited by their inability to integrate external knowledge and represent molecular structures across different levels of granularity. To address these limitations, we propose a paradigm shift by encoding molecular graphs into heterogeneous structures, introducing a novel framework: Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML). This approach leverages contrastive learning to enrich molecular representations with embedded external knowledge. KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction. Extensive benchmarking demonstrates KCHML's superiority over state-of-the-art molecular property prediction models, underscoring its ability to capture intricate molecular features.

Knowledge-aware contrastive heterogeneous molecular graph learning

TL;DR

The paper tackles molecular property and drug–drug interaction (DDI) prediction by moving beyond homogeneous graphs to knowledge-enhanced heterogeneous molecular graphs. It introduces Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML), a tripartite-view framework consisting of Molecular View, Element View, and Drug View, processed by a dual message-passing Heterogeneous Molecular Graph (HMG) encoder and optimized via cross-view contrastive learning. The training objective combines three cross-view losses into a total objective, , enabling unified pretraining on large unlabeled data such as 250k molecules from ZINC15 and DRKG-derived chemical knowledge. The approach demonstrates superior performance on MoleculeNet property-prediction tasks and on the TwoSide TDI dataset in both transductive and inductive settings, with ablation analyses showing the element view as a critical component and the multiview strategy providing robust generalization to unseen drugs.

Abstract

Molecular representation learning is pivotal in predicting molecular properties and advancing drug design. Traditional methodologies, which predominantly rely on homogeneous graph encoding, are limited by their inability to integrate external knowledge and represent molecular structures across different levels of granularity. To address these limitations, we propose a paradigm shift by encoding molecular graphs into heterogeneous structures, introducing a novel framework: Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML). This approach leverages contrastive learning to enrich molecular representations with embedded external knowledge. KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction. Extensive benchmarking demonstrates KCHML's superiority over state-of-the-art molecular property prediction models, underscoring its ability to capture intricate molecular features.

Paper Structure

This paper contains 1 section, 14 equations, 6 figures, 6 tables, 2 algorithms.

Table of Contents

  1. DDI Prediction

Figures (6)

  • Figure 1: Homogeneous graphs struggle to capture the nuanced granularity of molecules and hinder the integration of external knowledge. Introducing external knowledge through heterogeneous graphs mainly involves two challenges: (1) heterogeneous graph encoding and (2) imbalanced knowledge sampling.
  • Figure 2: Illustration of the KCHML model. (a) illustrates the three views of HMG, based on the molecular view, the element view is formed by adding two types of nodes and five types of edges, and the drug view is formed by adding one type of node and two types of edges. (b) describes the encoding process of the HMG encoder in detail. The lines of different colors indicate the source of the ($\mathbf{Q}, \mathbf{K}, \mathbf{V}$) of different nodes and edges. For example, the message of node $v_t$ is formed by edge $e_{st}$, and the message of edge $e_{st}$ is provided by $v_s$ and $e_{ts}$. (c) describes the construction process of contrastive learning sample pairs across multiple views.
  • Figure 3: Comparison of MPNN, DMPNN, GROVER, CMPNN, and KCHML
  • Figure 4: Positive pairs and negative pairs between any two views. Red points for positive pairs and blue points for negative pairs. Each line forms a term in the loss function.
  • Figure 5: Fine-tuning process for molecular property and DDI prediction tasks. The Projector used during pre-training is discarded, and the MP-Predictor and DDI-Predictor are employed for molecular property prediction and DDI prediction tasks, respectively.
  • ...and 1 more figures