Table of Contents
Fetching ...

Multi-state Protein Design with DynamicMPNN

Alex Abrudan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi, Matthew Greenig, Felipe Engelberger, Alena Khmelinskaia, Jens Meiler, Michele Vendruscolo, Tuomas P. J. Knowles

TL;DR

DynamicMPNN addresses the challenge of designing proteins that adopt multiple conformations by learning a joint distribution over sequences conditioned on multiple structural states. The model encodes each conformation and its binding context into a shared latent space and autoregressively decodes sequences, using SE(3)-equivariant GVP-based graphs and two pooling strategies to handle nonidentical sequences. A 46,033-cluster multi-conformational dataset built from CoDNaS expands coverage to 75% of CATH, and Alphafold3-based template evaluation demonstrates that DynamicMPNN improves over ProteinMPNN MSD by up to 25% in decoy-normalized RMSD and 12% in sequence recovery. This explicit multi-state training framework enables designing sequences that satisfy multiple conformational constraints, with implications for engineering bioswitches, allosteric regulators, and molecular machines, while highlighting opportunities for specialized models per conformation class. $p(Y|X_1,...,X_m)=\

Abstract

Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using Alphafold 3, DynamicMPNN outperforms ProteinMPNN by up to 25% on decoy-normalized RMSD and by 12% on sequence recovery across our challenging multi-state protein benchmark.

Multi-state Protein Design with DynamicMPNN

TL;DR

DynamicMPNN addresses the challenge of designing proteins that adopt multiple conformations by learning a joint distribution over sequences conditioned on multiple structural states. The model encodes each conformation and its binding context into a shared latent space and autoregressively decodes sequences, using SE(3)-equivariant GVP-based graphs and two pooling strategies to handle nonidentical sequences. A 46,033-cluster multi-conformational dataset built from CoDNaS expands coverage to 75% of CATH, and Alphafold3-based template evaluation demonstrates that DynamicMPNN improves over ProteinMPNN MSD by up to 25% in decoy-normalized RMSD and 12% in sequence recovery. This explicit multi-state training framework enables designing sequences that satisfy multiple conformational constraints, with implications for engineering bioswitches, allosteric regulators, and molecular machines, while highlighting opportunities for specialized models per conformation class. $p(Y|X_1,...,X_m)=\

Abstract

Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using Alphafold 3, DynamicMPNN outperforms ProteinMPNN by up to 25% on decoy-normalized RMSD and by 12% on sequence recovery across our challenging multi-state protein benchmark.

Paper Structure

This paper contains 17 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: DynamicMPNN for multi-state protein design. (a) Examples of proteins with multiple conformational states: transporters in closed and open states (PDB: 6NC7, 6NC9), metamorphic protein with alternative folds (PDB: 4QHH, 4QHF) and hinges showing domain movement (PDB: 5D0W, 1CFC). (b) Schematic of DynamicMPNN, an inverse folding model trained to generate protein sequences with multiple conformational states. Conformations are encoded with their respective chemical environments (i.e. interaction partners shown in gray). Solid lines show the flow of information in the model, while dashed lines show the evaluation pipeline using AlphaFold 3 (AF3); employing target structures as templates during inference and measuring the deviations between predicted and target structures, with decoy structures serving as negative controls.
  • Figure 2: Multi-state protein dataset. (a) Data processing pipeline used to construct sequence-aligned structure pairs. (b) Distribution of the number of conformations per CoDNaS cluster. (c) Distribution of the maximum C$\alpha$-RMSD between pairs of structures in each CoDNaS cluster.
  • Figure 3: Sequence recovery performance across DynamicMPNN model variants and ProteinMPNN baseline on multi-state protein benchmark ($n=96$). Combined training approaches achieve highest performance, with models that only incorporate multi or single state training data performing poorly.
  • Figure 4: Switch Arc protein case study. (a, b) ProteinMPNN and (c,d) DynamicMPNN best design structure prediction (pink and salmon, respectively) against both Arc states from PDB ID: 1BDT and 1QTG respectively (grey). The DynamicMPNN design recapitulates the beta sheet fold (c), but the ProteinMPNN design does not (a).