Table of Contents
Fetching ...

Molecular Graph Representation Learning via Structural Similarity Information

Chengyu Yao, Hong Huang, Hang Gao, Fengge Wu, Haiming Chen, Junsuo Zhao

TL;DR

The paper addresses the limitation that many molecular GNNs ignore inter-molecule structural similarity. It introduces MSSM-GNN, which constructs a MSSM graph by mapping molecules to motif-based representations through a motif dictionary and quantifying cross-molecule similarity with a Mahalanobis Weisfeiler-Lehman Shortest-Path (MWLSP) graph kernel, then learns embeddings via GNNs on this graph. The approach provides three core contributions: motif-based molecular representation, a MWLSP kernel that captures both length and position information, and a GNN that achieves state-of-the-art results on five molecular benchmarks and large-scale Open Graph Benchmark datasets; ablations validate the components. Empirically, MSSM-GNN demonstrates robust improvements over eleven baselines, confirming the value of incorporating global structural similarity into molecular representations. The work has practical implications for drug discovery and chemical safety assessments by enabling more accurate property predictions through global, similarity-aware molecular representations.

Abstract

Graph Neural Networks (GNNs) have been widely employed for feature representation learning in molecular graphs. Therefore, it is crucial to enhance the expressiveness of feature representation to ensure the effectiveness of GNNs. However, a significant portion of current research primarily focuses on the structural features within individual molecules, often overlooking the structural similarity between molecules, which is a crucial aspect encapsulating rich information on the relationship between molecular properties and structural characteristics. Thus, these approaches fail to capture the rich semantic information at the molecular structure level. To bridge this gap, we introduce the \textbf{Molecular Structural Similarity Motif GNN (MSSM-GNN)}, a novel molecular graph representation learning method that can capture structural similarity information among molecules from a global perspective. In particular, we propose a specially designed graph that leverages graph kernel algorithms to represent the similarity between molecules quantitatively. Subsequently, we employ GNNs to learn feature representations from molecular graphs, aiming to enhance the accuracy of property prediction by incorporating additional molecular representation information. Finally, through a series of experiments conducted on both small-scale and large-scale molecular datasets, we demonstrate that our model consistently outperforms eleven state-of-the-art baselines. The codes are available at https://github.com/yaoyao-yaoyao-cell/MSSM-GNN.

Molecular Graph Representation Learning via Structural Similarity Information

TL;DR

The paper addresses the limitation that many molecular GNNs ignore inter-molecule structural similarity. It introduces MSSM-GNN, which constructs a MSSM graph by mapping molecules to motif-based representations through a motif dictionary and quantifying cross-molecule similarity with a Mahalanobis Weisfeiler-Lehman Shortest-Path (MWLSP) graph kernel, then learns embeddings via GNNs on this graph. The approach provides three core contributions: motif-based molecular representation, a MWLSP kernel that captures both length and position information, and a GNN that achieves state-of-the-art results on five molecular benchmarks and large-scale Open Graph Benchmark datasets; ablations validate the components. Empirically, MSSM-GNN demonstrates robust improvements over eleven baselines, confirming the value of incorporating global structural similarity into molecular representations. The work has practical implications for drug discovery and chemical safety assessments by enabling more accurate property predictions through global, similarity-aware molecular representations.

Abstract

Graph Neural Networks (GNNs) have been widely employed for feature representation learning in molecular graphs. Therefore, it is crucial to enhance the expressiveness of feature representation to ensure the effectiveness of GNNs. However, a significant portion of current research primarily focuses on the structural features within individual molecules, often overlooking the structural similarity between molecules, which is a crucial aspect encapsulating rich information on the relationship between molecular properties and structural characteristics. Thus, these approaches fail to capture the rich semantic information at the molecular structure level. To bridge this gap, we introduce the \textbf{Molecular Structural Similarity Motif GNN (MSSM-GNN)}, a novel molecular graph representation learning method that can capture structural similarity information among molecules from a global perspective. In particular, we propose a specially designed graph that leverages graph kernel algorithms to represent the similarity between molecules quantitatively. Subsequently, we employ GNNs to learn feature representations from molecular graphs, aiming to enhance the accuracy of property prediction by incorporating additional molecular representation information. Finally, through a series of experiments conducted on both small-scale and large-scale molecular datasets, we demonstrate that our model consistently outperforms eleven state-of-the-art baselines. The codes are available at https://github.com/yaoyao-yaoyao-cell/MSSM-GNN.
Paper Structure (31 sections, 1 theorem, 13 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 31 sections, 1 theorem, 13 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

proposition thmcounterproposition

Let n be the average number of nodes and d be the dimensionality of the features. Each node is associated with a $d$-dimensional feature vector. The time complexity for the kernel given by Eq. eq:k_mwlsp is $O(n^3 + n^4*(1+Hnd^3))$.

Figures (3)

  • Figure 1: Examples of molecules with similar structures often exhibit similar properties, a phenomenon observed in biological and chemical domains.
  • Figure 2: The framework of our proposed Molecular Structural Similarity Motif Graph Neural Network.
  • Figure 3: Performance of MSSM-GNN on three different datasets with varying hyperparameters $c$.

Theorems & Definitions (1)

  • proposition thmcounterproposition