Knowledge-aware contrastive heterogeneous molecular graph learning
Mukun Chen, Jia Wu, Shirui Pan, Fu Lin, Bo Du, Xiuwen Gong, Wenbin Hu
TL;DR
The paper tackles molecular property and drug–drug interaction (DDI) prediction by moving beyond homogeneous graphs to knowledge-enhanced heterogeneous molecular graphs. It introduces Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML), a tripartite-view framework consisting of Molecular View, Element View, and Drug View, processed by a dual message-passing Heterogeneous Molecular Graph (HMG) encoder and optimized via cross-view contrastive learning. The training objective combines three cross-view losses into a total objective, ${\mathcal L}_{\text{total}} = {\mathcal L}^{M,EM} + {\mathcal L}^{M,DM} + {\mathcal L}^{EM,DM}$, enabling unified pretraining on large unlabeled data such as 250k molecules from ZINC15 and DRKG-derived chemical knowledge. The approach demonstrates superior performance on MoleculeNet property-prediction tasks and on the TwoSide TDI dataset in both transductive and inductive settings, with ablation analyses showing the element view as a critical component and the multiview strategy providing robust generalization to unseen drugs.
Abstract
Molecular representation learning is pivotal in predicting molecular properties and advancing drug design. Traditional methodologies, which predominantly rely on homogeneous graph encoding, are limited by their inability to integrate external knowledge and represent molecular structures across different levels of granularity. To address these limitations, we propose a paradigm shift by encoding molecular graphs into heterogeneous structures, introducing a novel framework: Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML). This approach leverages contrastive learning to enrich molecular representations with embedded external knowledge. KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction. Extensive benchmarking demonstrates KCHML's superiority over state-of-the-art molecular property prediction models, underscoring its ability to capture intricate molecular features.
