Table of Contents
Fetching ...

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training

Xiaoling Luo, Peng Chen, Chengliang Liu, Xiaopeng Jin, Jie Wen, Yumeng Liu, Junsong Wang

TL;DR

This work tackles the challenge of predicting protein functions by integrating multimodal data (sequence, spatial structure, and interactions) through a two-stage learning framework. It introduces reconstructive pre-training via PSSI and PSeI encoder–decoder modules to mine low-semantic, fine-grained features, followed by a dual-branch architecture with Bidirectional Interaction Module (BInM) and Dynamic Selection Module (DSM) to enable deep inter-modal learning and adaptive feature selection. The approach yields significant gains over both unimodal and existing multimodal methods across GO branches (BPO, MFO, CCO), with ablation and feature-analytic evidence underscoring the contributions of BInM, DSM, and pretraining. The model’s ability to extract rich multimodal representations and dynamically tailor feature combinations suggests practical impact for scalable and accurate protein function annotation in biological datasets, particularly when data noise and heterogeneity are present.

Abstract

Multimodal protein features play a crucial role in protein function prediction. However, these features encompass a wide range of information, ranging from structural data and sequence features to protein attributes and interaction networks, making it challenging to decipher their complex interconnections. In this work, we propose a multimodal protein function prediction method (DSRPGO) by utilizing dynamic selection and reconstructive pre-training mechanisms. To acquire complex protein information, we introduce reconstructive pre-training to mine more fine-grained information with low semantic levels. Moreover, we put forward the Bidirectional Interaction Module (BInM) to facilitate interactive learning among multimodal features. Additionally, to address the difficulty of hierarchical multi-label classification in this task, a Dynamic Selection Module (DSM) is designed to select the feature representation that is most conducive to current protein function prediction. Our proposed DSRPGO model improves significantly in BPO, MFO, and CCO on human datasets, thereby outperforming other benchmark models.

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training

TL;DR

This work tackles the challenge of predicting protein functions by integrating multimodal data (sequence, spatial structure, and interactions) through a two-stage learning framework. It introduces reconstructive pre-training via PSSI and PSeI encoder–decoder modules to mine low-semantic, fine-grained features, followed by a dual-branch architecture with Bidirectional Interaction Module (BInM) and Dynamic Selection Module (DSM) to enable deep inter-modal learning and adaptive feature selection. The approach yields significant gains over both unimodal and existing multimodal methods across GO branches (BPO, MFO, CCO), with ablation and feature-analytic evidence underscoring the contributions of BInM, DSM, and pretraining. The model’s ability to extract rich multimodal representations and dynamically tailor feature combinations suggests practical impact for scalable and accurate protein function annotation in biological datasets, particularly when data noise and heterogeneity are present.

Abstract

Multimodal protein features play a crucial role in protein function prediction. However, these features encompass a wide range of information, ranging from structural data and sequence features to protein attributes and interaction networks, making it challenging to decipher their complex interconnections. In this work, we propose a multimodal protein function prediction method (DSRPGO) by utilizing dynamic selection and reconstructive pre-training mechanisms. To acquire complex protein information, we introduce reconstructive pre-training to mine more fine-grained information with low semantic levels. Moreover, we put forward the Bidirectional Interaction Module (BInM) to facilitate interactive learning among multimodal features. Additionally, to address the difficulty of hierarchical multi-label classification in this task, a Dynamic Selection Module (DSM) is designed to select the feature representation that is most conducive to current protein function prediction. Our proposed DSRPGO model improves significantly in BPO, MFO, and CCO on human datasets, thereby outperforming other benchmark models.

Paper Structure

This paper contains 15 sections, 9 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: An illustration of our proposed method. This method is mainly divided into two stages. The first stage is to pre-train the Protein Spatial Structure Information (PSSI) encoder and Protein Sequence Information (PSeI) encoder for the injection of multimodal knowledge . The second stage is training our proposed DSRPGO model, which consists of an MSL-Branch, a MIL-Branch with the Bidirectional Interaction Module (BInM), and the Dynamic Selection Module (DSM).
  • Figure 2: Structure of the BiMamba block.
  • Figure 3: Davies Bouldin Score comparison of different protein features represents. o_PPI, o_Attribute, and o_Sequence represent the original embedding of PPI, subcellular localization combined with domain, and protein language model, respectively. MSL_embedding, MSI_embedding, and DSM_embedding represent the embedding from MSL-Branch, MIL-Branch, and DSM, respectively.
  • Figure 4: Visualization of different feature representations for DSRPGO, and comparison with CFAGO.