Table of Contents
Fetching ...

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang

TL;DR

HelixProtX addresses the fragmentation of protein research across modalities by proposing a unified large multimodal model for any-to-any protein generation. The system uses a two-encoder arrangement (Sequence Encoder based on HelixFold-Single and Structure Encoder via ProteinMPNN) coupled with Abstractors to align multimodal representations to an LLM (ERNIE-Lite), and introduces a Residue Angle Head to decode backbone geometry with a hidden size of $d_h=4096$ and six torsion angles per residue, optimized by $L_{ ext{NLL}}$ for text and $L_{ ext{angle}}$ for structure. Empirical results show HelixProtX outperforms baselines in description prediction and sequence design, demonstrates robust structure design capabilities, and gains from joint training over task-specific approaches. The approach holds promise for accelerating protein biology research by enabling robust cross-modal generation and reasoning across sequences, structures, and descriptions.

Abstract

Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

TL;DR

HelixProtX addresses the fragmentation of protein research across modalities by proposing a unified large multimodal model for any-to-any protein generation. The system uses a two-encoder arrangement (Sequence Encoder based on HelixFold-Single and Structure Encoder via ProteinMPNN) coupled with Abstractors to align multimodal representations to an LLM (ERNIE-Lite), and introduces a Residue Angle Head to decode backbone geometry with a hidden size of and six torsion angles per residue, optimized by for text and for structure. Empirical results show HelixProtX outperforms baselines in description prediction and sequence design, demonstrates robust structure design capabilities, and gains from joint training over task-specific approaches. The approach holds promise for accelerating protein biology research by enabling robust cross-modal generation and reasoning across sequences, structures, and descriptions.

Abstract

Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.
Paper Structure (26 sections, 2 equations, 7 figures, 4 tables, 3 algorithms)

This paper contains 26 sections, 2 equations, 7 figures, 4 tables, 3 algorithms.

Figures (7)

  • Figure 1: Overview of HelixProtX.a Any-to-any protein generation produces diverse output modalities (sequence, structure, and description) from any input modality (sequence, structure, or description). b Large language model (LLM)-based system for protein any-to-any protein generation.
  • Figure 2: HelixProtX model architecture, training paradigm, and data overview. a. The architecture of multimodal language model. b. Training paradigm. c. Demonstration of the input format of various protein-related tasks.
  • Figure 3: Results of description prediction. a. Overall text coherence comparison. b. Impact of protein sequence length. c. Protein function accuracy comparison. d. A case illustrating description prediction task.
  • Figure 4: Results of sequence design. a. Performance comparison for description-to-sequence task. b. Performance comparison for structure-to-sequence task. c. Distribution of the reference sequences and designed sequence produced by HelixProtX for the description-to-sequence task. d. An example of sequence design task.
  • Figure 5: Results of structure prediction. a. Performance comparison for the description-to-structure task. b. Performance comparison for the sequence-to-structure task. c. The distribution of predicted protein angles versus actual protein angles. d. An example of the description-to-structure task.
  • ...and 2 more figures