Table of Contents
Fetching ...

CTAGE: Curvature-Based Topology-Aware Graph Embedding for Learning Molecular Representations

Yili Chen, Zhengyu Li, Zheng Wan, Hui Yu, Xian Wei

TL;DR

This paper addresses the challenge of incorporating spatial information into molecular property prediction without incurring heavy 3D modeling costs. It introduces CTAGE, a curvature-based topology-aware graph embedding that computes $k$-hop Forman-Ricci curvature on molecular graphs and embeds these signals into existing graph neural networks and graph transformers (e.g., Graphormer, GCN) by forming curvature-aware node features. The authors demonstrate that 2-hop curvature embeddings yield notable gains across MoleculeNet benchmarks, with Graphormer showing marked improvements and GCN benefiting from curvature signals as well, while highlighting potential overfitting with excessive hops. CTAGE advances practical molecular representation by enriching topological information through a lightweight, geometry-inspired descriptor that can be plugged into common GNN/Transformer backbones, offering improved predictive performance without significantly increasing training complexity.

Abstract

AI-driven drug design relies significantly on predicting molecular properties, which is a complex task. In current approaches, the most commonly used feature representations for training deep neural network models are based on SMILES and molecular graphs. While these methods are concise and efficient, they have limitations in capturing complex spatial information. Recently, researchers have recognized the importance of incorporating three-dimensional information of molecular structures into models. However, capturing spatial information requires the introduction of additional units in the generator, bringing additional design and computational costs. Therefore, it is necessary to develop a method for predicting molecular properties that effectively combines spatial structural information while maintaining the simplicity and efficiency of graph neural networks. In this work, we propose an embedding approach CTAGE, utilizing $k$-hop discrete Ricci curvature to extract structural insights from molecular graph data. This effectively integrates spatial structural information while preserving the training complexity of the network. Experimental results indicate that introducing node curvature significantly improves the performance of current graph neural network frameworks, validating that the information from k-hop node curvature effectively reflects the relationship between molecular structure and function.

CTAGE: Curvature-Based Topology-Aware Graph Embedding for Learning Molecular Representations

TL;DR

This paper addresses the challenge of incorporating spatial information into molecular property prediction without incurring heavy 3D modeling costs. It introduces CTAGE, a curvature-based topology-aware graph embedding that computes -hop Forman-Ricci curvature on molecular graphs and embeds these signals into existing graph neural networks and graph transformers (e.g., Graphormer, GCN) by forming curvature-aware node features. The authors demonstrate that 2-hop curvature embeddings yield notable gains across MoleculeNet benchmarks, with Graphormer showing marked improvements and GCN benefiting from curvature signals as well, while highlighting potential overfitting with excessive hops. CTAGE advances practical molecular representation by enriching topological information through a lightweight, geometry-inspired descriptor that can be plugged into common GNN/Transformer backbones, offering improved predictive performance without significantly increasing training complexity.

Abstract

AI-driven drug design relies significantly on predicting molecular properties, which is a complex task. In current approaches, the most commonly used feature representations for training deep neural network models are based on SMILES and molecular graphs. While these methods are concise and efficient, they have limitations in capturing complex spatial information. Recently, researchers have recognized the importance of incorporating three-dimensional information of molecular structures into models. However, capturing spatial information requires the introduction of additional units in the generator, bringing additional design and computational costs. Therefore, it is necessary to develop a method for predicting molecular properties that effectively combines spatial structural information while maintaining the simplicity and efficiency of graph neural networks. In this work, we propose an embedding approach CTAGE, utilizing -hop discrete Ricci curvature to extract structural insights from molecular graph data. This effectively integrates spatial structural information while preserving the training complexity of the network. Experimental results indicate that introducing node curvature significantly improves the performance of current graph neural network frameworks, validating that the information from k-hop node curvature effectively reflects the relationship between molecular structure and function.
Paper Structure (26 sections, 8 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 8 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Electronic density maps: (a) benzene, (b) chlorobenzene, (c) differential electronic density map of 1-phenylpyrrole. The 1-hop node curvature information describes the density differences of electron clouds between benzene and chlorobenzene, while the 2-hop node curvature information of 1-phenylpyrrole describes more complex differences in differential electronic density.
  • Figure 2: Encodes $k-$hop node curvature information of atoms in molecules at different cutoff radii into node features.
  • Figure 3: Comparison of regression scatter plots between GCN and GCN & CTAGE (2-hop node curvature) on the ESOL test set.
  • Figure 4: Comparison of regression scatter plots between GCN and GCN & CTAGE (2-hop node curvature) on the FreeSolv test set.
  • Figure 5: Visualization of the latent space of GCN on BACE and BBBP datasets. Using t-SNE to map extracted molecular features into 2D coordinate points, assigning different colors to points based on the positive or negative values of the labels.
  • ...and 1 more figures