Table of Contents
Fetching ...

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Chenqing Hua, Yong Liu, Dinghuai Zhang, Odin Zhang, Sitao Luan, Kevin K. Yang, Guy Wolf, Doina Precup, Shuangjia Zheng

TL;DR

This work introduces EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets for specific substrates and catalytic reactions.

Abstract

Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology. Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions, particularly in catalytic processes. To address the challenges, we introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets for specific substrates and catalytic reactions. Additionally, we introduce a large-scale, curated, and validated dataset of enzyme-reaction pairs, specifically designed for the catalytic pocket generation task, comprising a total of $328,192$ pairs. By incorporating evolutionary dynamics and reaction-specific adaptations, EnzymeFlow becomes a powerful model for designing enzyme pockets, which is capable of catalyzing a wide range of biochemical reactions. Experiments on the new dataset demonstrate the model's effectiveness in designing high-quality, functional enzyme catalytic pockets, paving the way for advancements in enzyme engineering and synthetic biology. We provide EnzymeFlow code at https://github.com/WillHua127/EnzymeFlow with notebook demonstration at https://github.com/WillHua127/EnzymeFlow/blob/main/enzymeflow_demo.ipynb.

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

TL;DR

This work introduces EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets for specific substrates and catalytic reactions.

Abstract

Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology. Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions, particularly in catalytic processes. To address the challenges, we introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets for specific substrates and catalytic reactions. Additionally, we introduce a large-scale, curated, and validated dataset of enzyme-reaction pairs, specifically designed for the catalytic pocket generation task, comprising a total of pairs. By incorporating evolutionary dynamics and reaction-specific adaptations, EnzymeFlow becomes a powerful model for designing enzyme pockets, which is capable of catalyzing a wide range of biochemical reactions. Experiments on the new dataset demonstrate the model's effectiveness in designing high-quality, functional enzyme catalytic pockets, paving the way for advancements in enzyme engineering and synthetic biology. We provide EnzymeFlow code at https://github.com/WillHua127/EnzymeFlow with notebook demonstration at https://github.com/WillHua127/EnzymeFlow/blob/main/enzymeflow_demo.ipynb.
Paper Structure (39 sections, 19 equations, 14 figures, 3 tables)

This paper contains 39 sections, 19 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Enzyme-substrate Mechanism.
  • Figure 2: (a) Enzyme pocket extraction workflow with AlphaFill. (b) Quality analysis of clustering between enzyme pockets and full structures; good clusters have high functional concentration.
  • Figure 3: Overview of EnzymeFlow with hierarchical pre-training and enzyme-reaction co-evolution. (1) Flow model pre-trained on protein backbones and amino acid types. (2) Flow model further pre-trained on protein binding pockets, conditioned on ligand molecules with geometry-specific optimization. (3) Flow model fine-tuned on enzyme catalytic pockets, and conditioned on substrate and product molecules, with enzyme-reaction co-evolution and EC-class generation.
  • Figure 4: Catalytic pocket design example using EnzymeFlow (UniProt: Q7U4P2). The pocket generation is conditioned on reaction CN[C@H](C(=O)C)CS.C/C=C$\backslash\backslash$1/C(=C/c2[nH]c(c(c2C)CCC(=O)O)/C=C/2$\backslash\backslash$N=C(C(=C2CCC (=O)O)C)C[C@H]2NC(=O)C(=C2C)C=C)/NC(=O)[C@@H]1C $\rightarrow$ CN[C@H](C(=O)C)CSC(C1=C(C)C(=O)N[C@H]1Cc1[nH]c(c(c1C)CCC(=O) O)/C=C/1$\backslash\backslash$N=C(C(=C1CCC(=O)O)C)C[C@H]1NC(=O)C(=C1C)C=C)C of EC4 (ligase enzyme), from $t=0$ to $t=1$.
  • Figure 5: Case study of catalytic pocket design (UniProt: B8MXP5). We show the reference and designed pockets of different models. The pocket generation is conditioned on reaction OC[C@H]1O[C@@H](Oc2ccccc2/C=C$\backslash\backslash$C(=O)O)[C@@H]([C@H]([C@@H]1O)O)O $\rightarrow$ OC(=O)/C=C$\backslash\backslash$c1ccccc1O of EC3.
  • ...and 9 more figures