Table of Contents
Fetching ...

GDEGAN: Gaussian Dynamic Equivariant Graph Attention Network for Ligand Binding Site Prediction

Animesh, Plaban Kumar Bhowmick, Pralay Mitra

Abstract

Accurate prediction of binding sites of a given protein, to which ligands can bind, is a critical step in structure-based computational drug discovery. Recently, Equivariant Graph Neural Networks (GNNs) have emerged as a powerful paradigm for binding site identification methods due to the large-scale availability of 3D structures of proteins via protein databases and AlphaFold predictions. The state-of-the-art equivariant GNN methods implement dot product attention, disregarding the variation in the chemical and geometric properties of the neighboring residues. To capture this variation, we propose GDEGAN (Gaussian Dynamic Equivariant Graph Attention Network), which replaces dot-product attention with adaptive kernels that recognize binding sites. The proposed attention mechanism captures variation in neighboring residues using statistics of their characteristic local feature distributions. Our mechanism dynamically computes neighborhood statistics at each layer, using local variance as an adaptive bandwidth parameter with learnable per-head temperatures, enabling each protein region to determine its own context-specific importance. GDEGAN outperforms existing methods with relative improvements of 37-66% in DCC and 7-19% DCA success rates across COACH420, HOLO4k, and PDBBind2020 datasets. These advances have direct application in accelerating protein-ligand docking by identifying potential binding sites for therapeutic target identification.

GDEGAN: Gaussian Dynamic Equivariant Graph Attention Network for Ligand Binding Site Prediction

Abstract

Accurate prediction of binding sites of a given protein, to which ligands can bind, is a critical step in structure-based computational drug discovery. Recently, Equivariant Graph Neural Networks (GNNs) have emerged as a powerful paradigm for binding site identification methods due to the large-scale availability of 3D structures of proteins via protein databases and AlphaFold predictions. The state-of-the-art equivariant GNN methods implement dot product attention, disregarding the variation in the chemical and geometric properties of the neighboring residues. To capture this variation, we propose GDEGAN (Gaussian Dynamic Equivariant Graph Attention Network), which replaces dot-product attention with adaptive kernels that recognize binding sites. The proposed attention mechanism captures variation in neighboring residues using statistics of their characteristic local feature distributions. Our mechanism dynamically computes neighborhood statistics at each layer, using local variance as an adaptive bandwidth parameter with learnable per-head temperatures, enabling each protein region to determine its own context-specific importance. GDEGAN outperforms existing methods with relative improvements of 37-66% in DCC and 7-19% DCA success rates across COACH420, HOLO4k, and PDBBind2020 datasets. These advances have direct application in accelerating protein-ligand docking by identifying potential binding sites for therapeutic target identification.
Paper Structure (33 sections, 2 theorems, 27 equations, 6 figures, 6 tables)

This paper contains 33 sections, 2 theorems, 27 equations, 6 figures, 6 tables.

Key Result

Proposition 3.1

The $E(3)$ equivariance breaks after the introduction of ESM-embeddings because, now node features do not encode chirality information. Therefore, the network maintains $SE(3)$ equivariance but loses reflection equivariance.

Figures (6)

  • Figure 1: GDEGAN architecture for protein ligand binding site identification.Left: Overview of the GDEGAN framework showing the integration of protein-specific ESM-2 embeddings with geometric processing through $L$ layers of Gaussian Dynamic Attention. Right: Detailed view of the Gaussian Dynamic Attention (GDA) module. In this $\oplus$, $\cdot$ and $\circ$ denotes addition, dot product and element-wise product respectively. HTR is inherited from GotenNet gotennet. Soft. stands for Softmax, Agg. for Aggregation.
  • Figure 2: Visualization of Protein 'PDB:1u72(A)'.Left: Model prediction (red) vs true center (green) with coordinates. Right: Predicted residues: True Positive (green), False Positive (red), False Negative (blue).
  • Figure 3: Attention Patters Visualizations of Protein 'PDB:3c2f(A)'.Left: On the left we show the attention patterns of GDEGAN, and on the Right: attention patterns of GotenNet(full).
  • Figure 4: Ablation analysis.(a) Model depth analysis showing performance peaks at 4 layers with increased parameters leading to oversmoothing. (b) Temperature evolution across training epochs averaged across multiple runs with mean and standard deviation. (c) Learned temperatures distribution across 8 attention heads (best model).
  • Figure 5: Training and Validation loss curve. Left: On the left (a) we show training loss convergence, and on the Right: (b) validation loss convergence.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Proposition 3.1: From $E(3)$ to $SE(3)$ Equivariance with Invariant Node Features
  • Proposition 3.2: GDA Preserves $SE(3)$ Equivariance
  • Remark 3.3
  • Remark 3.4: Parameter and Computational Efficiency
  • proof
  • proof