Table of Contents
Fetching ...

Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry

Zeyu Wang, Tianyi Jiang, Jinhuan Wang, Qi Xuan

TL;DR

This work addresses the limitation of single-modality molecular representations by introducing SGGRL, a multi-modal framework that jointly learns from SMILES sequences, molecular graphs, and geometric information. It employs dedicated encoders for each modality, a GlobalAttentionPool-based readout, a learnable fusion layer, and a cross-modal contrastive objective to align representations across modalities. Empirical results on seven MoleculeNet datasets show SGGRL achieving state-of-the-art performance, with ablations confirming the value of multi-modal fusion, Bi-LSTM preprocessing for SMILES, and attention pooling. The approach offers a scalable path to more accurate molecular property predictions, with released code for reproducibility.

Abstract

Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process. Recently, with the advancement of machine learning, deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods, garnering significant attention. Among them, molecular representation learning is the key factor for molecular property prediction performance. And there are lots of sequence-based, graph-based, and geometry-based methods that have been proposed. However, the majority of existing studies focus solely on one modality for learning molecular representations, failing to comprehensively capture molecular characteristics and information. In this paper, a novel multi-modal representation learning model, which integrates the sequence, graph, and geometry characteristics, is proposed for molecular property prediction, called SGGRL. Specifically, we design a fusion layer to fusion the representation of different modalities. Furthermore, to ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules. To verify the effectiveness of SGGRL, seven molecular datasets, and several baselines are used for evaluation and comparison. The experimental results demonstrate that SGGRL consistently outperforms the baselines in most cases. This further underscores the capability of SGGRL to comprehensively capture molecular information. Overall, the proposed SGGRL model showcases its potential to revolutionize molecular property prediction by leveraging multi-modal representation learning to extract diverse and comprehensive molecular insights. Our code is released at https://github.com/Vencent-Won/SGGRL.

Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry

TL;DR

This work addresses the limitation of single-modality molecular representations by introducing SGGRL, a multi-modal framework that jointly learns from SMILES sequences, molecular graphs, and geometric information. It employs dedicated encoders for each modality, a GlobalAttentionPool-based readout, a learnable fusion layer, and a cross-modal contrastive objective to align representations across modalities. Empirical results on seven MoleculeNet datasets show SGGRL achieving state-of-the-art performance, with ablations confirming the value of multi-modal fusion, Bi-LSTM preprocessing for SMILES, and attention pooling. The approach offers a scalable path to more accurate molecular property predictions, with released code for reproducibility.

Abstract

Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process. Recently, with the advancement of machine learning, deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods, garnering significant attention. Among them, molecular representation learning is the key factor for molecular property prediction performance. And there are lots of sequence-based, graph-based, and geometry-based methods that have been proposed. However, the majority of existing studies focus solely on one modality for learning molecular representations, failing to comprehensively capture molecular characteristics and information. In this paper, a novel multi-modal representation learning model, which integrates the sequence, graph, and geometry characteristics, is proposed for molecular property prediction, called SGGRL. Specifically, we design a fusion layer to fusion the representation of different modalities. Furthermore, to ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules. To verify the effectiveness of SGGRL, seven molecular datasets, and several baselines are used for evaluation and comparison. The experimental results demonstrate that SGGRL consistently outperforms the baselines in most cases. This further underscores the capability of SGGRL to comprehensively capture molecular information. Overall, the proposed SGGRL model showcases its potential to revolutionize molecular property prediction by leveraging multi-modal representation learning to extract diverse and comprehensive molecular insights. Our code is released at https://github.com/Vencent-Won/SGGRL.
Paper Structure (19 sections, 17 equations, 3 figures, 4 tables)

This paper contains 19 sections, 17 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The examples of sequence, graph, and geometry modal of a molecule.
  • Figure 2: Overview of SGGRL.
  • Figure 3: T-SNE visualization of the molecular representation space of GraphMVP, CMPNN, GraSeq, and SGGRL on the BBBP dataset. The red dots denote the negative labels, and the blue dots denote the positive labels.