Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX
Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang
TL;DR
HelixProtX addresses the fragmentation of protein research across modalities by proposing a unified large multimodal model for any-to-any protein generation. The system uses a two-encoder arrangement (Sequence Encoder based on HelixFold-Single and Structure Encoder via ProteinMPNN) coupled with Abstractors to align multimodal representations to an LLM (ERNIE-Lite), and introduces a Residue Angle Head to decode backbone geometry with a hidden size of $d_h=4096$ and six torsion angles per residue, optimized by $L_{ ext{NLL}}$ for text and $L_{ ext{angle}}$ for structure. Empirical results show HelixProtX outperforms baselines in description prediction and sequence design, demonstrates robust structure design capabilities, and gains from joint training over task-specific approaches. The approach holds promise for accelerating protein biology research by enabling robust cross-modal generation and reasoning across sequences, structures, and descriptions.
Abstract
Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.
