Table of Contents
Fetching ...

Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction

Yining Qian, Lijie Su, Meiling Xu, Xianpeng Wang

Abstract

Predicting protein secondary structure is essential for understanding protein function and advancing drug discovery. However, the intricate sequence-structure relationship poses significant challenges for accurate modeling. To address these, we propose MOGP-MMF, a multi-objective genetic programming framework that reformulates PSSP as an automated optimization task focused on feature selection and fusion. Specifically, MOGP-MMF introduces a multi-view multi-level representation strategy that integrates evolutionary, semantic, and newly introduced structural views to capture the comprehensive protein folding logic. Leveraging an enriched operator set, the framework evolves both linear and nonlinear fusion functions, effectively capturing high-order feature interactions while reducing fusion complexity. To resolve the accuracy-complexity trade-off, an improved multi-objective GP algorithm is developed, incorporating a knowledge transfer mechanism that utilizes prior evolutionary experience to guide the population toward global optima. Extensive experiments across seven benchmark datasets demonstrate that MOGP-MMF surpasses state-of-the-art methods, particularly in Q8 accuracy and structural integrity. Furthermore, MOGP-MMF generates a diverse set of non-dominated solutions, offering flexible model selection schemes for various practical application scenarios. The source code is available on GitHub: https://github.com/qian-ann/MOGP-MMF/tree/main.

Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction

Abstract

Predicting protein secondary structure is essential for understanding protein function and advancing drug discovery. However, the intricate sequence-structure relationship poses significant challenges for accurate modeling. To address these, we propose MOGP-MMF, a multi-objective genetic programming framework that reformulates PSSP as an automated optimization task focused on feature selection and fusion. Specifically, MOGP-MMF introduces a multi-view multi-level representation strategy that integrates evolutionary, semantic, and newly introduced structural views to capture the comprehensive protein folding logic. Leveraging an enriched operator set, the framework evolves both linear and nonlinear fusion functions, effectively capturing high-order feature interactions while reducing fusion complexity. To resolve the accuracy-complexity trade-off, an improved multi-objective GP algorithm is developed, incorporating a knowledge transfer mechanism that utilizes prior evolutionary experience to guide the population toward global optima. Extensive experiments across seven benchmark datasets demonstrate that MOGP-MMF surpasses state-of-the-art methods, particularly in Q8 accuracy and structural integrity. Furthermore, MOGP-MMF generates a diverse set of non-dominated solutions, offering flexible model selection schemes for various practical application scenarios. The source code is available on GitHub: https://github.com/qian-ann/MOGP-MMF/tree/main.
Paper Structure (28 sections, 6 equations, 7 figures, 7 tables, 3 algorithms)

This paper contains 28 sections, 6 equations, 7 figures, 7 tables, 3 algorithms.

Figures (7)

  • Figure 1: Crystal structure of chimeric carbonic anhydrase VI bound to ethoxzolamide (PDB ID: 6QL2), illustrating the protein secondary structure prediction.
  • Figure 2: The overall workflow of MOGP-MMF. It involves extracting multi-view features (PSSM, HMM, T5, SaProt), performing evolutionary feature fusion via an enriched operator set and genetic strategies, and finally selecting the optimal program for performance evaluation.
  • Figure 3: Architecture of multi-level feature extractors.
  • Figure 4: Visualization of comparisons between models with and without nonlinear operators on CB6133 dataset: (a) Accuracy comparison under MOGP, (b) Complexity comparison under MOGP.
  • Figure 5: Performance comparisons on the CB6133 dataset. (a-b) Accuracy and complexity of SOGP vs. Naive MOGP. (c-d) Impact of Knowledge Transfer (KT) on MOGP accuracy and complexity.
  • ...and 2 more figures