Multi-Modal CLIP-Informed Protein Editing

Mingze Yin; Hanjing Zhou; Yiheng Zhu; Miao Lin; Yixuan Wu; Jialu Wu; Hongxia Xu; Chang-Yu Hsieh; Tingjun Hou; Jintai Chen; Jian Wu

Multi-Modal CLIP-Informed Protein Editing

Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu

TL;DR

ProtET improves the state-of-the-art results by a large margin, leading to significant stability improvements of 16.67% and 16.90%.

Abstract

Proteins govern most biological functions essential for life, but achieving controllable protein discovery and optimization remains challenging. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback. To fill these gaps, we propose a novel method called ProtET for efficient CLIP-informed protein editing through multi-modality learning. Our approach comprises two stages: in the pretraining stage, contrastive learning aligns protein-biotext representations encoded by two large language models (LLMs), respectively. Subsequently, during the protein editing stage, the fused features from editing instruction texts and original protein sequences serve as the final editing condition for generating target protein sequences. Comprehensive experiments demonstrated the superiority of ProtET in editing proteins to enhance human-expected functionality across multiple attribute domains, including enzyme catalytic activity, protein stability and antibody specific binding ability. And ProtET improves the state-of-the-art results by a large margin, leading to significant stability improvements of 16.67% and 16.90%. This capability positions ProtET to advance real-world artificial protein editing, potentially addressing unmet academic, industrial, and clinical needs.

Multi-Modal CLIP-Informed Protein Editing

TL;DR

ProtET improves the state-of-the-art results by a large margin, leading to significant stability improvements of 16.67% and 16.90%.

Abstract

Paper Structure (22 sections, 5 equations, 7 figures, 5 tables)

This paper contains 22 sections, 5 equations, 7 figures, 5 tables.

Introduction
Methods
Curated protein-biotext dataset
Multi-modality pretraining
Protein editing generator
Implementation details
Results
Protein function classification
Problem setup
Experimental results
Enzyme catalytic activity editing
Problem setup
Experimental results
Protein stability editing
Problem setup
...and 7 more sections

Figures (7)

Figure 1: An illustration of the protein-biotext pair. The textual descriptions include the protein's name, function, subcellular location, biological process, and similarity to other proteins.
Figure 2: Coverage ratios of protein property annotations in Swiss-Prot and TrEMBL.
Figure 3: The workflow and framework details of ProtET. (A) A CLIP-like contrastive pretraining aligns features of protein sequences and biotext descriptions. (B) The FiLM module and transformer-decoders for protein editing. The FiLM module integrates multimodal features from the original protein sequences and the editing instruction texts, serving as the editing condition. Based on this condition, transformer-decoders design edited protein sequences through an autoregressive generation process. (C) Details of the FiLM module. It extracts multiplicative and additive factors from text features using linear mappings. These factors conditionally optimize protein features through addition and multiplication, to create fused features. (D) Details of the Transformer decoder. It uses a multi-head self-attention module to learn comprehensive residue interactions and predicts the next residue based on the previous ones.
Figure 4: The compositional structure of the enzyme dataset. According to annotated catalytic activity scores, the enzyme dataset is divided into four subsets with different fitness levels, and we present the proportion of data for these constructed subsets.
Figure 5: The t-SNE visualization results. Different colors indicate enzymes with different fitness levels correspondingly. Enzymes with medium, low and zero fitness tend to cluster together with high-functionality enzymes after being edited by ProtET.
...and 2 more figures

Multi-Modal CLIP-Informed Protein Editing

TL;DR

Abstract

Multi-Modal CLIP-Informed Protein Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (7)