Table of Contents
Fetching ...

UniZyme: A Unified Protein Cleavage Site Predictor Enhanced with Enzyme Active-Site Knowledge

Chenao Li, Shuo Yan, Enyan Dai

TL;DR

This work addresses the limitation of enzyme-specific cleavage-site predictors by introducing UniZyme, a unified model that generalizes across diverse proteolytic enzymes. It achieves this by a biochemically-informed enzyme encoder that incorporates energetic frustration and 3D structure to bias transformer attention, complemented by active-site knowledge through auxiliary prediction, large-scale pretraining, and active-site-aware pooling, along with substrate encoding via ESM-2. Empirical results show strong performance in both supervised and zero-shot settings, including HIV‑1 enzyme substrates, with Ablation studies confirming the contribution of energy-based features and active-site information. The approach enables robust cleavage-site predictions for novel enzymes and broad substrate spaces, with potential impact on drug design and enzyme engineering.

Abstract

Enzyme-catalyzed protein cleavage is essential for many biological functions. Accurate prediction of cleavage sites can facilitate various applications such as drug development, enzyme design, and a deeper understanding of biological mechanisms. However, most existing models are restricted to an individual enzyme, which neglects shared knowledge of enzymes and fails generalize to novel enzymes. Thus, we introduce a unified protein cleavage site predictor named UniZyme, which can generalize across diverse enzymes. To enhance the enzyme encoding for the protein cleavage site prediction, UniZyme employs a novel biochemically-informed model architecture along with active-site knowledge of proteolytic enzymes. Extensive experiments demonstrate that UniZyme achieves high accuracy in predicting cleavage sites across a range of proteolytic enzymes, including unseen enzymes. The code is available in https://anonymous.4open.science/r/UniZyme-4A67.

UniZyme: A Unified Protein Cleavage Site Predictor Enhanced with Enzyme Active-Site Knowledge

TL;DR

This work addresses the limitation of enzyme-specific cleavage-site predictors by introducing UniZyme, a unified model that generalizes across diverse proteolytic enzymes. It achieves this by a biochemically-informed enzyme encoder that incorporates energetic frustration and 3D structure to bias transformer attention, complemented by active-site knowledge through auxiliary prediction, large-scale pretraining, and active-site-aware pooling, along with substrate encoding via ESM-2. Empirical results show strong performance in both supervised and zero-shot settings, including HIV‑1 enzyme substrates, with Ablation studies confirming the contribution of energy-based features and active-site information. The approach enables robust cleavage-site predictions for novel enzymes and broad substrate spaces, with potential impact on drug design and enzyme engineering.

Abstract

Enzyme-catalyzed protein cleavage is essential for many biological functions. Accurate prediction of cleavage sites can facilitate various applications such as drug development, enzyme design, and a deeper understanding of biological mechanisms. However, most existing models are restricted to an individual enzyme, which neglects shared knowledge of enzymes and fails generalize to novel enzymes. Thus, we introduce a unified protein cleavage site predictor named UniZyme, which can generalize across diverse enzymes. To enhance the enzyme encoding for the protein cleavage site prediction, UniZyme employs a novel biochemically-informed model architecture along with active-site knowledge of proteolytic enzymes. Extensive experiments demonstrate that UniZyme achieves high accuracy in predicting cleavage sites across a range of proteolytic enzymes, including unseen enzymes. The code is available in https://anonymous.4open.science/r/UniZyme-4A67.

Paper Structure

This paper contains 33 sections, 17 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of the enzyme-catalyzed protein hydroysis.
  • Figure 2: Architecture of biochemically-informed enzyme encoder and overall framework of .
  • Figure 3: Visualizations of predicted substrate cleavage sites for HIV-1 enzymes. Predicted cleavage sites are in red color.
  • Figure 4: Ablation studies on supervised and zero-shot settings.
  • Figure 5: Hyperparameter sensitivity analysis.
  • ...and 3 more figures