Table of Contents
Fetching ...

Graph Residual based Method for Molecular Property Prediction

Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar

TL;DR

A novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules.

Abstract

Machine learning-driven methods for property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time for critical applications. The traditional machine learning models predict properties based on the features extracted from the molecules, which are often not easily available. In this work, a novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, which constitutes the training data. This manuscript highlights a detailed description of the novel GRU-based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has been provided as well. The results have been compared with standard benchmark datasets as well as some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.

Graph Residual based Method for Molecular Property Prediction

TL;DR

A novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules.

Abstract

Machine learning-driven methods for property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time for critical applications. The traditional machine learning models predict properties based on the features extracted from the molecules, which are often not easily available. In this work, a novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, which constitutes the training data. This manuscript highlights a detailed description of the novel GRU-based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has been provided as well. The results have been compared with standard benchmark datasets as well as some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.
Paper Structure (19 sections, 7 equations, 13 figures, 11 tables)

This paper contains 19 sections, 7 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Schematic representation of preprocessing of SMILES data.
  • Figure 2: Schematic diagram of the flow of ECRGNN model.
  • Figure 3: Comparative Representation of the outputs of GNNExplainer on the target Molecule Name: 5-ethylsulfanyl-3-methyl-1-(2-methyl propyl)-6-[[2-(trifluoromethyl)phenyl]methyl] thieno [2,3-d]pyrimidine-2,4-dione, (a) Molecular Structure, (b) Node Feature Importance, (c) Output generated by the ECC model, (d) Output generated by the ECRGNN model. In this case, the target property considered is Lipophilicity. (The grayscale value of the arrows signifies the strength of the relationship between the two nodes, which directly correlates to the predicted property)
  • Figure 4: Comparative Representation of the outputs of GNNExplainer on the target Molecule Name: 2-[2-[2,4-dimethoxy-5-[(2-methyl-3,4-dihydro-2H-quinolin-1-yl)sulfonyl]anilino]-2-oxoethyl]sulfanylacetic acid, (a) Molecular Structure, (b) Node Feature Importance, (c) Output generated by the ECC model, (d) Output generated by the ECRGNN model. In this case, the target property considered is Lipophilicity. (The grayscale value of the arrows signifies the strength of the relationship between the two nodes, which directly correlates to the predicted property)
  • Figure 5: Comparative Representation of the outputs of GNNExplainer on the target Molecule Name: 4-methoxy-1H-indazole (COc1cccc2[nH]ncc12), (a) Molecular Structure, (b) Node Feature Importance, (c) Output generated by the ECC model, (d) Output generated by the ECRGNN model. In this case, the target property considered is Lipophilicity. (The grayscale value of the arrows signifies the strength of the relationship between the two nodes, which directly correlates to the predicted property)
  • ...and 8 more figures