Table of Contents
Fetching ...

Let Physics Guide Your Protein Flows: Topology-aware Unfolding and Generation

Yogesh Verma, Markus Heinonen, Vikas Garg

TL;DR

PhysFlow introduces a physics-guided, topology-preserving forward noising process for protein backbones and combines it with SE(3) flow matching to learn a physically consistent reverse generative process. By decomposing SE(3) into SO(3) and $\mathbb{R}^3$, the method constructs independent, geometry-aware flows and uses an angular backbone representation to unfold toward predefined secondary structures while preventing clashes. The learning objective combines conditional flow matching with a Look-Ahead loss and auxiliary terms, enabling both unconditional backbone generation and sequence-conditioned folding. Experimental results on a 24k-protein PDB subset show state-of-the-art designability and novelty for unconditional generation and competitive RMSD for sequence-conditioned folding, underscoring the practical impact for de novo protein design. The work highlights a principled integration of physics-based dynamics with modern generative modeling, offering scalable paths to more realistic and designable protein structures.

Abstract

Protein structure prediction and folding are fundamental to understanding biology, with recent deep learning advances reshaping the field. Diffusion-based generative models have revolutionized protein design, enabling the creation of novel proteins. However, these methods often neglect the intrinsic physical realism of proteins, driven by noising dynamics that lack grounding in physical principles. To address this, we first introduce a physically motivated non-linear noising process, grounded in classical physics, that unfolds proteins into secondary structures (e.g., alpha helices, linear beta sheets) while preserving topological integrity--maintaining bonds, and preventing collisions. We then integrate this process with the flow-matching paradigm on SE(3) to model the invariant distribution of protein backbones with high fidelity, incorporating sequence information to enable sequence-conditioned folding and expand the generative capabilities of our model. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in unconditional protein generation, producing more designable and novel protein structures while accurately folding monomer sequences into precise protein conformations.

Let Physics Guide Your Protein Flows: Topology-aware Unfolding and Generation

TL;DR

PhysFlow introduces a physics-guided, topology-preserving forward noising process for protein backbones and combines it with SE(3) flow matching to learn a physically consistent reverse generative process. By decomposing SE(3) into SO(3) and , the method constructs independent, geometry-aware flows and uses an angular backbone representation to unfold toward predefined secondary structures while preventing clashes. The learning objective combines conditional flow matching with a Look-Ahead loss and auxiliary terms, enabling both unconditional backbone generation and sequence-conditioned folding. Experimental results on a 24k-protein PDB subset show state-of-the-art designability and novelty for unconditional generation and competitive RMSD for sequence-conditioned folding, underscoring the practical impact for de novo protein design. The work highlights a principled integration of physics-based dynamics with modern generative modeling, offering scalable paths to more realistic and designable protein structures.

Abstract

Protein structure prediction and folding are fundamental to understanding biology, with recent deep learning advances reshaping the field. Diffusion-based generative models have revolutionized protein design, enabling the creation of novel proteins. However, these methods often neglect the intrinsic physical realism of proteins, driven by noising dynamics that lack grounding in physical principles. To address this, we first introduce a physically motivated non-linear noising process, grounded in classical physics, that unfolds proteins into secondary structures (e.g., alpha helices, linear beta sheets) while preserving topological integrity--maintaining bonds, and preventing collisions. We then integrate this process with the flow-matching paradigm on SE(3) to model the invariant distribution of protein backbones with high fidelity, incorporating sequence information to enable sequence-conditioned folding and expand the generative capabilities of our model. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in unconditional protein generation, producing more designable and novel protein structures while accurately folding monomer sequences into precise protein conformations.

Paper Structure

This paper contains 55 sections, 1 theorem, 45 equations, 6 figures, 6 tables.

Key Result

Proposition 1

Let $\mathbf{x} \in \mathbb{R}^{3N}$ denote the backbone Cartesian coordinates of a protein with $N$ residues, and let $\mathbf{z}(\mathbf{x})$ be its angular representation. Let $\mathcal{T}_{\mathbf{t}}: \mathbb{R}^{3N} \to \mathbb{R}^{3N}$ denote a translation by vector $\mathbf{t} \in \mathbb{R}

Figures (6)

  • Figure 1: Generation by PhysFlow and cartesian diffusion/flow-based methods. Inference trajectories for unconditional monomer generation are compared between PhysFlow and cartesian diffusion/flow-based methods.
  • Figure 2: PhysFlow Model Pipeline ($f_{\eta}$). The model takes as input a noised structural state together with the sequence information. These are first processed independently by a structure encoder and a sequence encoder. The resulting representations are then integrated through a combiner module, after which a structure decoder predicts both the velocity field and the auxiliary predictions.
  • Figure 3: PhysFlow Samples. Designable backbones generated unconditionally by PhysFlow model.
  • Figure 4: Runtime Complexity.
  • Figure 5: (left) Distribution of the sequence length of protein $\ell_{prot}$ in the obtained monomer dataset, (right) Average number of residue–residue steric collisions (defined as pairwise residue distance $< d(\mathbf{x}_i,\mathbf{x}_j)$(nm)) observed across the forward trajectory of a protein of length 116, for Cartesian diffusion/flow baselines and variants of our PhysFlow method with different weightings $(k_1, k_2)$. We observe that the number of collisions decreases as the weight on Coulombic repulsion ($k_2$) increases within the PhysFlow framework.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Proposition 1: Translation Invariance of Angular Representation