VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification
Florian Sestak, Lisa Schneckenreiter, Johannes Brandstetter, Sepp Hochreiter, Andreas Mayr, Günter Klambauer
TL;DR
This work tackles binding site identification in proteins using 3D structure data. It introduces VN-EGNN, an $E(3)$-equivariant graph neural network that augments EGNNs with $K$ virtual nodes connected to all physical nodes and a three-phase message passing scheme to learn hidden geometric entities like binding-site centers. The model optimizes a joint objective combining binding-site center loss $L_{\mathrm{bsc}}$ and segmentation loss $L_{\mathrm{dice}}$, and employs a self-confidence module, yielding state-of-the-art performance on COACH420, HOLO4K, and PDBbind2020. By predicting binding-site centers directly and leveraging virtual-node representations, VN-EGNN provides accurate, interpretable proposals for ligand binding regions while maintaining computational efficiency on residue-level graphs; this advances structure-based drug design and could scale to large structure databases such as AlphaFold predictions.
Abstract
Being able to identify regions within or around proteins, to which ligands can potentially bind, is an essential step to develop new drugs. Binding site identification methods can now profit from the availability of large amounts of 3D structures in protein structure databases or from AlphaFold predictions. Current binding site identification methods heavily rely on graph neural networks (GNNs), usually designed to output E(3)-equivariant predictions. Such methods turned out to be very beneficial for physics-related tasks like binding energy or motion trajectory prediction. However, the performance of GNNs at binding site identification is still limited potentially due to the lack of dedicated nodes that model hidden geometric entities, such as binding pockets. In this work, we extend E(n)-Equivariant Graph Neural Networks (EGNNs) by adding virtual nodes and applying an extended message passing scheme. The virtual nodes in these graphs are dedicated quantities to learn representations of binding sites, which leads to improved predictive performance. In our experiments, we show that our proposed method VN-EGNN sets a new state-of-the-art at locating binding site centers on COACH420, HOLO4K and PDBbind2020.
