EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction
Yang Zhang, Zhewei Wei, Ye Yuan, Chongxuan Li, Wenbing Huang
TL;DR
EquiPocket introduces an E(3)-equivariant geometric graph neural network for ligand binding site prediction that avoids voxel-based representations. By integrating a Local Geometric Modeling module on surface probes, a Global Structure Modeling module for chemical and spatial protein information, and a Surface Message Passing module with an adaptive Dense Attention layer, the method remains invariant to rotations and translations while handling proteins of varying sizes. Ablation and benchmark results show significant performance gains over state-of-the-art approaches, with further improvements from a relative-direction task and density-aware attention. The work advances structure-based drug discovery by delivering a geometry-aware, scalable, and rotation-invariant framework for accurately identifying binding sites on large and irregular protein surfaces.
Abstract
Predicting the binding sites of target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of protein size shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction, which comprises three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of protein and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to alleviate the effect incurred by variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods.
