Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Fei Wang; Xinye Zheng; Kun Li; Yanyan Wei; Yuxin Liu; Ganpeng Hu; Tong Bao; Jingwen Yang

Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Fei Wang, Xinye Zheng, Kun Li, Yanyan Wei, Yuxin Liu, Ganpeng Hu, Tong Bao, Jingwen Yang

Abstract

Predicting enzyme kinetic parameters quantifies how efficiently an enzyme catalyzes a specific substrate under defined biochemical conditions. Canonical parameters such as the turnover number ($k_\text{cat}$), Michaelis constant ($K_\text{m}$), and inhibition constant ($K_\text{i}$) depend jointly on the enzyme sequence, the substrate chemistry, and the conformational adaptation of the active site during binding. Many learning pipelines simplify this process to a static compatibility problem between the enzyme and substrate, fusing their representations through shallow operations and regressing a single value. Such formulations overlook the staged nature of catalysis, which involves both substrate recognition and conformational adaptation. In this regard, we reformulate kinetic prediction as a staged multimodal conditional modeling problem and introduce the Enzyme-Reaction Bridging Adapter (ERBA), which injects cross-modal information via fine-tuning into Protein Language Models (PLMs) while preserving their biochemical priors. ERBA performs conditioning in two stages: Molecular Recognition Cross-Attention (MRCA) first injects substrate information into the enzyme representation to capture specificity; Geometry-aware Mixture-of-Experts (G-MoE) then integrates active-site structure and routes samples to pocket-specialized experts to reflect induced fit. To maintain semantic fidelity, Enzyme-Substrate Distribution Alignment (ESDA) enforces distributional consistency within the PLM manifold in a reproducing kernel Hilbert space. Experiments across three kinetic endpoints and multiple PLM backbones, ERBA delivers consistent gains and stronger out-of-distribution performance compared with sequence-only and shallow-fusion baselines, offering a biologically grounded route to scalable kinetic prediction and a foundation for adding cofactors, mutations, and time-resolved structural cues.

Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Abstract

Predicting enzyme kinetic parameters quantifies how efficiently an enzyme catalyzes a specific substrate under defined biochemical conditions. Canonical parameters such as the turnover number (

), Michaelis constant (

), and inhibition constant (

) depend jointly on the enzyme sequence, the substrate chemistry, and the conformational adaptation of the active site during binding. Many learning pipelines simplify this process to a static compatibility problem between the enzyme and substrate, fusing their representations through shallow operations and regressing a single value. Such formulations overlook the staged nature of catalysis, which involves both substrate recognition and conformational adaptation. In this regard, we reformulate kinetic prediction as a staged multimodal conditional modeling problem and introduce the Enzyme-Reaction Bridging Adapter (ERBA), which injects cross-modal information via fine-tuning into Protein Language Models (PLMs) while preserving their biochemical priors. ERBA performs conditioning in two stages: Molecular Recognition Cross-Attention (MRCA) first injects substrate information into the enzyme representation to capture specificity; Geometry-aware Mixture-of-Experts (G-MoE) then integrates active-site structure and routes samples to pocket-specialized experts to reflect induced fit. To maintain semantic fidelity, Enzyme-Substrate Distribution Alignment (ESDA) enforces distributional consistency within the PLM manifold in a reproducing kernel Hilbert space. Experiments across three kinetic endpoints and multiple PLM backbones, ERBA delivers consistent gains and stronger out-of-distribution performance compared with sequence-only and shallow-fusion baselines, offering a biologically grounded route to scalable kinetic prediction and a foundation for adding cofactors, mutations, and time-resolved structural cues.

Paper Structure (58 sections, 15 equations, 7 figures, 9 tables)

This paper contains 58 sections, 15 equations, 7 figures, 9 tables.

Introduction
Related Work
Enzyme Kinetic Parameter Prediction
Transformer-based Protein Language Models
Multimodal Integration for Enzyme Modeling
Methodology
Preliminaries
Problem Definition.
Existing Formulation.
Mechanism-Aligned Formulation.
Molecular Recognition Cross-Attention
Geometry-aware Mixture-of-Experts
Enzyme-Substrate Distribution Alignment
Enzyme Reaction Optimization
Experiments
...and 43 more sections

Figures (7)

Figure 1: Architecture of the proposed ERBA. It augments a sequence-only PLM with multimodal conditioning on substrate chemistry and pocket geometry. MRCA injects substrate fingerprints into enzyme embeddings to capture recognition specificity, and G-MoE integrates local 3D pocket structure to model conformational adaptation. Update and query paths couple both modules to the backbone, while ESDA aligns representations with the prior PLM, enabling accurate prediction of enzyme kinetic parameters.
Figure 2: G-Gating & Router pools sequence-substrate and structural cues to produce gating logits, activate the top-k geometry-relevant experts, and suppress others. Selected experts transform structure-conditioned features, and an MLP fuses them into a geometry-adaptive representation for kinetic regression.
Figure 3: Log-scaled experimental versus predicted values for the kinetic parameters$k_{\text{cat}}$,$K_\text{m}$, and$K_\text{i}$. Each plot shows the absolute error less than or equal to 1 as a percentage, denoted as 1-$\text{Radio}_{\text{AE}}$. The dashed red line represents perfect predictions.
Figure 4: Ablation studies on fusion order and manner. Comparison of different fusion strategies: $\mathbf{S}_e$→$\mathbf{S}_g$→$\mathbf{S}_m$, Concat & MLP, and the proposed $\mathbf{S}_e$→$\mathbf{S}_m$→$\mathbf{S}_g$/ERBA. Percentage improvements across metrics are highlighted in red.
Figure 5: Error Distribution Comparison Across Different Backbone Models and ESM Sizes. It shows the error distribution of predicted versus experimental values for three kinetic parameters ($k_\text{cat}$, $K_\text{m}$, and $K_\text{i}$) across four backbone models: (a) Ankh3-5.7B Alsamkary2025Ankh3, (b) ProtT5-3B elnaggar2021prottrans, (c) ESM2-650M Lin2023ESM2, and (d) ESM2-3B Lin2023ESM2, each augmented with ERBA. The plots show the proportion of predictions with absolute error less than or equal to 1, denoted as 1-$\text{Radio}_{\text{AE}}$.
...and 2 more figures

Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Abstract

Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Authors

Abstract

Table of Contents

Figures (7)