Table of Contents
Fetching ...

Multi-Task Genetic Algorithm with Multi-Granularity Encoding for Protein-Nucleotide Binding Site Prediction

Yiming Gao, Liuyi Xu, Pengshan Cui, Yining Qian, An-Yang Lu, Xianpeng Wang

Abstract

Accurate identification of protein-nucleotide binding sites is fundamental to deciphering molecular mechanisms and accelerating drug discovery. However, current computational methods often struggle with suboptimal performance due to inadequate feature representation and rigid fusion mechanisms, which hinder the effective exploitation of cross-task information synergy. To bridge this gap, we propose MTGA-MGE, a framework that integrates a Multi-Task Genetic Algorithm with Multi-Granularity Encoding to enhance binding site prediction. Specifically, we develop a Multi-Granularity Encoding (MGE) network that synergizes multi-scale convolutions and self-attention mechanisms to distill discriminative signals from high-dimensional, redundant biological data. To overcome the constraints of static fusion, a genetic algorithm is employed to adaptively evolve task-specific fusion strategies, thereby effectively improving model generalization. Furthermore, to catalyze collaborative learning, we introduce an External-Neighborhood Mechanism (ENM) that leverages biological similarities to facilitate targeted information exchange across tasks. Extensive evaluations on fifteen nucleotide datasets demonstrate that MTGA-MGE not only establishes a new state-of-the-art in data-abundant, high-resource scenarios but also maintains a robust competitive edge in rare, low-resource regimes, presenting a highly adaptive scheme for decoding complex protein-ligand interactions in the post-genomic era.

Multi-Task Genetic Algorithm with Multi-Granularity Encoding for Protein-Nucleotide Binding Site Prediction

Abstract

Accurate identification of protein-nucleotide binding sites is fundamental to deciphering molecular mechanisms and accelerating drug discovery. However, current computational methods often struggle with suboptimal performance due to inadequate feature representation and rigid fusion mechanisms, which hinder the effective exploitation of cross-task information synergy. To bridge this gap, we propose MTGA-MGE, a framework that integrates a Multi-Task Genetic Algorithm with Multi-Granularity Encoding to enhance binding site prediction. Specifically, we develop a Multi-Granularity Encoding (MGE) network that synergizes multi-scale convolutions and self-attention mechanisms to distill discriminative signals from high-dimensional, redundant biological data. To overcome the constraints of static fusion, a genetic algorithm is employed to adaptively evolve task-specific fusion strategies, thereby effectively improving model generalization. Furthermore, to catalyze collaborative learning, we introduce an External-Neighborhood Mechanism (ENM) that leverages biological similarities to facilitate targeted information exchange across tasks. Extensive evaluations on fifteen nucleotide datasets demonstrate that MTGA-MGE not only establishes a new state-of-the-art in data-abundant, high-resource scenarios but also maintains a robust competitive edge in rare, low-resource regimes, presenting a highly adaptive scheme for decoding complex protein-ligand interactions in the post-genomic era.
Paper Structure (24 sections, 16 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 24 sections, 16 equations, 6 figures, 5 tables, 3 algorithms.

Figures (6)

  • Figure 1: Schematic representation of protein–nucleotide interaction, further illustrating the action of a drug at the protein–nucleotide binding sites. The highlighted regions (orange/yellow) illustrate the binding residues distributed across the protein structure.
  • Figure 2: Framework of MTGA-MGE. The framework operates through three key components: I. Multi-Granularity Encoding, II. Multi-Task Genetic Algorithm, III. External-Neighborhood Mechanism. Finally, IV. Test Process illustrates the model inference phase.
  • Figure 3: Radar plots comparing the performance of NucMTL (red), NucGMTL (blue), and MTGA-MGE (green) on ten low-resource nucleotide tasks. Panels (a)--(c) display the MCC, while panels (d)--(f) present the AUPRC. In each plot, the axes represent specific nucleotide tasks, with a larger enclosed area indicating superior predictive performance.
  • Figure 4: Visualization of protein-ADP binding site predictions. The protein structures are shown in gray cartoon representation. Red: True Positive (TP); Blue: False Negative (FN); Yellow: False Positive (FP).
  • Figure 5: Overall performances of MGE. (1) Performance comparison (MCC and AUPRC) of different convolutional encoder architectures across five high-resource nucleotide tasks. (2) t-SNE visualization comparing the residue representations learned (a) using raw PLM features alone and (b) using MGE. (3) t-SNE visualization comparing residue representations learned under different input configurations: (a) single-task GDP input and (b) joint GDP--ADP input. In all t-SNE plots, data points are colored by prediction outcome: true positive (TP, purple), false negative (FN, red), false positive (FP, blue), and true negative (TN, gray).
  • ...and 1 more figures