Molecular Graph Representation Learning via Structural Similarity Information
Chengyu Yao, Hong Huang, Hang Gao, Fengge Wu, Haiming Chen, Junsuo Zhao
TL;DR
The paper addresses the limitation that many molecular GNNs ignore inter-molecule structural similarity. It introduces MSSM-GNN, which constructs a MSSM graph by mapping molecules to motif-based representations through a motif dictionary and quantifying cross-molecule similarity with a Mahalanobis Weisfeiler-Lehman Shortest-Path (MWLSP) graph kernel, then learns embeddings via GNNs on this graph. The approach provides three core contributions: motif-based molecular representation, a MWLSP kernel that captures both length and position information, and a GNN that achieves state-of-the-art results on five molecular benchmarks and large-scale Open Graph Benchmark datasets; ablations validate the components. Empirically, MSSM-GNN demonstrates robust improvements over eleven baselines, confirming the value of incorporating global structural similarity into molecular representations. The work has practical implications for drug discovery and chemical safety assessments by enabling more accurate property predictions through global, similarity-aware molecular representations.
Abstract
Graph Neural Networks (GNNs) have been widely employed for feature representation learning in molecular graphs. Therefore, it is crucial to enhance the expressiveness of feature representation to ensure the effectiveness of GNNs. However, a significant portion of current research primarily focuses on the structural features within individual molecules, often overlooking the structural similarity between molecules, which is a crucial aspect encapsulating rich information on the relationship between molecular properties and structural characteristics. Thus, these approaches fail to capture the rich semantic information at the molecular structure level. To bridge this gap, we introduce the \textbf{Molecular Structural Similarity Motif GNN (MSSM-GNN)}, a novel molecular graph representation learning method that can capture structural similarity information among molecules from a global perspective. In particular, we propose a specially designed graph that leverages graph kernel algorithms to represent the similarity between molecules quantitatively. Subsequently, we employ GNNs to learn feature representations from molecular graphs, aiming to enhance the accuracy of property prediction by incorporating additional molecular representation information. Finally, through a series of experiments conducted on both small-scale and large-scale molecular datasets, we demonstrate that our model consistently outperforms eleven state-of-the-art baselines. The codes are available at https://github.com/yaoyao-yaoyao-cell/MSSM-GNN.
