Table of Contents
Fetching ...

Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks

Sayedmohammadreza Rastegari, Sina Tabakhi, Xianyuan Liu, Wei Sang, Haiping Lu

TL;DR

The paper addresses the challenge of predicting metal-binding residues and their metal types by exploiting the entire network of co-evolved residues with graph neural networks. MBGNN constructs co-evolved residue networks from MSAs and PLM embeddings, then applies two SAGEConv-based GNNs (one for metal-binding and one for metal-type) in an M-fold ensemble to improve robustness. Results on a MetalNet2-derived dataset show clear gains in metal-binding precision and metal-type F1 compared with prior co-evolution methods, and competitive performance against sequence-based approaches, with notable strength on underrepresented metals. This work demonstrates that integrating co-evolutionary structure with graph-based learning can enhance understanding of protein-metal interactions and offers a scalable approach for predicting both binding sites and their metal identities, potentially aiding drug discovery and biotechnology applications.

Abstract

In computational structural biology, predicting metal-binding sites and their corresponding metal types is challenging due to the complexity of protein structures and interactions. Conventional sequence- and structure-based prediction approaches cannot capture the complex evolutionary relationships driving these interactions to facilitate understanding, while recent co-evolution-based approaches do not fully consider the entire structure of the co-evolved residue network. In this paper, we introduce MBGNN (Metal-Binding Graph Neural Network) that utilizes the entire co-evolved residue network and effectively captures the complex dependencies within protein structures via graph neural networks to enhance the prediction of co-evolved metal-binding residues and their associated metal types. Experimental results on a public dataset show that MBGNN outperforms existing co-evolution-based metal-binding prediction methods, and it is also competitive against recent sequence-based methods, showing the potential of integrating co-evolutionary insights with advanced machine learning to deepen our understanding of protein-metal interactions. The MBGNN code is publicly available at https://github.com/SRastegari/MBGNN.

Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks

TL;DR

The paper addresses the challenge of predicting metal-binding residues and their metal types by exploiting the entire network of co-evolved residues with graph neural networks. MBGNN constructs co-evolved residue networks from MSAs and PLM embeddings, then applies two SAGEConv-based GNNs (one for metal-binding and one for metal-type) in an M-fold ensemble to improve robustness. Results on a MetalNet2-derived dataset show clear gains in metal-binding precision and metal-type F1 compared with prior co-evolution methods, and competitive performance against sequence-based approaches, with notable strength on underrepresented metals. This work demonstrates that integrating co-evolutionary structure with graph-based learning can enhance understanding of protein-metal interactions and offers a scalable approach for predicting both binding sites and their metal identities, potentially aiding drug discovery and biotechnology applications.

Abstract

In computational structural biology, predicting metal-binding sites and their corresponding metal types is challenging due to the complexity of protein structures and interactions. Conventional sequence- and structure-based prediction approaches cannot capture the complex evolutionary relationships driving these interactions to facilitate understanding, while recent co-evolution-based approaches do not fully consider the entire structure of the co-evolved residue network. In this paper, we introduce MBGNN (Metal-Binding Graph Neural Network) that utilizes the entire co-evolved residue network and effectively captures the complex dependencies within protein structures via graph neural networks to enhance the prediction of co-evolved metal-binding residues and their associated metal types. Experimental results on a public dataset show that MBGNN outperforms existing co-evolution-based metal-binding prediction methods, and it is also competitive against recent sequence-based methods, showing the potential of integrating co-evolutionary insights with advanced machine learning to deepen our understanding of protein-metal interactions. The MBGNN code is publicly available at https://github.com/SRastegari/MBGNN.

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of MBGNN's pipeline. (a) Co-evolved network construction starts with a protein of interest (POI) chain sequence, followed by multiple sequence alignment (MSA). Then, the MSA Transformer and a protein language model (PLM) are utilized to identify co-evolved pairs and obtain residue-level embeddings, respectively. Once the pairs are extracted, they are organized into co-evolved residue networks, and each residue is mapped to its corresponding PLM-derived embedding. (b) The metal-binding predictor processes these networks using the average probabilities produced by GNNs trained on different folds of data using an M-fold ensemble strategy to identify metal-binding residues. (c) Predicted metal-binding residues are assembled into new co-evolved networks, and each residue is mapped to its corresponding PLM-derived embedding again. The metal-type predictor takes the newly identified co-evolved residue networks and classifies their associated metal type into one of the 11 categories, using probabilities of GNN models again trained using an M-fold ensemble strategy.
  • Figure 2: Distribution of metal types in the dataset.
  • Figure 3: Performance comparison of MBGNN for metal-type prediction on the fixed hold-out test set against co-evolution-based methods (a) and sequence-based methods, i.e., LMetalSite (b) and M-Ionic (c). In all subfigures, the "Mean" column represents the macro average of the values of the metal columns.