Table of Contents
Fetching ...

GraphPrint: Extracting Features from 3D Protein Structure for Drug Target Affinity Prediction

Amritpal Singh

TL;DR

GraphPrint addresses the limitation of sequence-based DTA models by incorporating 3D protein structure into a multimodal graph framework. It uses AlphaFold-based protein graphs, RDKit drug graphs, and handcrafted fingerprints in a four-branch network, with ablation confirming the 3D features provide complementary information. The method achieves a mean squared error of $0.1378$ and a concordance index of $0.8929$ on the KIBA dataset, competitive with state-of-the-art approaches. This work suggests that integrating 3D structure can accelerate drug discovery and offers a public 3D-aware KIBA resource for future research.

Abstract

Accurate drug target affinity prediction can improve drug candidate selection, accelerate the drug discovery process, and reduce drug production costs. Previous work focused on traditional fingerprints or used features extracted based on the amino acid sequence in the protein, ignoring its 3D structure which affects its binding affinity. In this work, we propose GraphPrint: a framework for incorporating 3D protein structure features for drug target affinity prediction. We generate graph representations for protein 3D structures using amino acid residue location coordinates and combine them with drug graph representation and traditional features to jointly learn drug target affinity. Our model achieves a mean square error of 0.1378 and a concordance index of 0.8929 on the KIBA dataset and improves over using traditional protein features alone. Our ablation study shows that the 3D protein structure-based features provide information complementary to traditional features.

GraphPrint: Extracting Features from 3D Protein Structure for Drug Target Affinity Prediction

TL;DR

GraphPrint addresses the limitation of sequence-based DTA models by incorporating 3D protein structure into a multimodal graph framework. It uses AlphaFold-based protein graphs, RDKit drug graphs, and handcrafted fingerprints in a four-branch network, with ablation confirming the 3D features provide complementary information. The method achieves a mean squared error of and a concordance index of on the KIBA dataset, competitive with state-of-the-art approaches. This work suggests that integrating 3D structure can accelerate drug discovery and offers a public 3D-aware KIBA resource for future research.

Abstract

Accurate drug target affinity prediction can improve drug candidate selection, accelerate the drug discovery process, and reduce drug production costs. Previous work focused on traditional fingerprints or used features extracted based on the amino acid sequence in the protein, ignoring its 3D structure which affects its binding affinity. In this work, we propose GraphPrint: a framework for incorporating 3D protein structure features for drug target affinity prediction. We generate graph representations for protein 3D structures using amino acid residue location coordinates and combine them with drug graph representation and traditional features to jointly learn drug target affinity. Our model achieves a mean square error of 0.1378 and a concordance index of 0.8929 on the KIBA dataset and improves over using traditional protein features alone. Our ablation study shows that the 3D protein structure-based features provide information complementary to traditional features.
Paper Structure (13 sections, 1 equation, 5 figures, 3 tables)

This paper contains 13 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Protein structure can be visualized at several levels. The primary structure involves the structure of amino acids in proteins. The secondary structure involves the formation of helix and sheet structures. The tertiary structure involves the folding of the amino acid chain into a 3D space. Proteins with more than one peptide chain can have a quaternary structure, which involves further folding of chains over each other.
  • Figure 2: Diagram showing pipeline for feature extraction for protein using Aplhafold alphafold and architecture for GraphPrint, with multihead architecture containing graph isomorphic convolution layers (GINCONV) and 1D convolutional blocks, followed by concatenation of features into a multilayer perceptron as a classifier.
  • Figure 3: Protein graph representation: We calculate the center of mass of the amino acid and use this as a center of mass for the amino acid residue. Each residue represents a node, with node features containing position and amino acid properties.
  • Figure 4: Error breakdown based on drug ID, protein ID, aromatic compounds in drugs, bonds inside drug. A small amount of drugs and proteins contribute the most amount of error.
  • Figure 5: Scatter plots showing the mse error contribution vs. different parameters. There is a linear relation between the number of atoms, aromatic atoms, and bonds to the error contribution of the respective drug molecule.