SE(3) diffusion model with application to protein backbone generation

Jason Yim; Brian L. Trippe; Valentin De Bortoli; Emile Mathieu; Arnaud Doucet; Regina Barzilay; Tommi Jaakkola

SE(3) diffusion model with application to protein backbone generation

Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola

TL;DR

This work develops FrameDiff, a theoretically grounded SE(3) invariant diffusion model over multiple backbone frames for de novo protein backbone generation. It provides a principled forward diffusion on SE(3)^N, derives SE(3) invariant training by centering the frame set, and implements FramePred to predict both denoised frames and per-residue torsions using SE(3)-equivariant networks. Empirically, FrameDiff can generate designable, diverse monomer backbones up to length 500 without pretrained structure predictors, yielding samples that generalize beyond known PDB structures and approach the performance of pretrained baselines on designability. The framework advances diffusion on Lie groups and offers a foundation for scalable, principled design in proteins and other SE(3)-based domains, with potential extensions to conditional sequence-to-structure tasks and robotics applications.

Abstract

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.

SE(3) diffusion model with application to protein backbone generation

TL;DR

Abstract

Paper Structure (74 sections, 33 theorems, 126 equations, 7 figures, 4 tables, 3 algorithms)

This paper contains 74 sections, 33 theorems, 126 equations, 7 figures, 4 tables, 3 algorithms.

Introduction
Preliminaries and Notation
Backbone parameterization.
Diffusion modeling on manifolds.
Diffusion models on $\mathrm{SE}(3)$
Forward diffusion on $\mathrm{SE}(3)$
Denoising score matching on $\mathrm{SE}(3)$
$\mathrm{SE}(3)$ invariance through centered $\mathrm{SE}(3)^N$
Protein backbone diffusion model
$\mathrm{FramePred}$: score and torsion prediction
Training losses
Sampling
Experiments
Monomeric protein generation and evaluation
Results
...and 59 more sections

Key Result

Proposition 2.1

Let $\mathrm{T}_\mathrm{F} > 0$ and $\overleftarrow{\mathbf{X}}^{(t)}$ given by $\overleftarrow{\mathbf{X}}^{(0)} \stackrel{d}{=} \mathbf{X}^{(\mathrm{T}_\mathrm{F})}$ and where $p_t$ is the density of $\mathbf{X}^{(t)}$. Then under mild assumptions on $\mathcal{M}$ and $p_0$ we have that $\overleftarrow{\mathbf{X}}^{(t)} \stackrel{d}{=} \mathbf{X}^{(\mathrm{T}_\mathrm{F}-t)}$.

Figures (7)

Figure 1: Method overview. (A) Backbone parameterization with frames. Each residue along the protein chain shares the same structure of backbone atoms due to the fixed bonds between each atom. Performing the GramSchmidt operation on vectors $v_1, v_2$ results in rotation matrix $r$ that parameterizes the $\texttt{N} - \texttt{C}_\alpha - \texttt{C}$ placements with respect to the frame translation, $x$, set to the $\texttt{C}_\alpha$ coordinates. An additional torsion angle, $\psi$, is required to determine the placement of the oxygen atom, $\texttt{O}$. (B) Inference is performed by sampling $N$ frames initialized from the reference distribution over rotations and translations. Then a time-reversed $\mathrm{SE}(3)$ diffusion is run from $t=\mathrm{T}_\mathrm{F}$ to $t=0$ at which point the $\psi$ angle is predicted. The final frames and $\psi$ angles are used to construct the protein backbone atoms.
Figure 2: Single layer of $\mathrm{FrameDiff}$. Each layer takes in the current node embedding $\mathbf{h}_\ell$, edge embedding $\mathbf{z}_\ell$, frames $\mathbf{T}_\ell$, and initial node embedding $\mathbf{h}_0$. Rectangles indicate trainable neural networks. Node embeddings are first updated using IPA with a skip connection. Before Transformer, the initial node embeddings and post-IPA embeddings are concatenated. After transformer, we include a skip connection with post-IPA embeddings. The updated node embeddings $\mathbf{h}_{\ell+1}$ are then used to update edge embeddings $\mathbf{z}_{\ell+1}$ as well as predict frame updates $\mathbf{T}_{\ell+1}$. See \ref{['sec:architecture']} for in-depth architecture details.
Figure 3: Designability, diversity, and novelty of $\mathrm{FrameDiff}$ generated backbones with $\zeta=0.1$, $N_\mathrm{steps}=500$, $N_\mathrm{seq}=100$. (A)$\texttt{scRMSD}$ based on 100 backbone samples of each length 70, 100, 200, 300 for $N_\mathrm{seq}=8,100$ plotted in the same manner as done in RFdiffusion. (B) Scatter plot of Designability ($\texttt{scRMSD}$) vs. novelty ($\texttt{pdbTM}$) across lengths. (C) Selected samples from panel (B) of novel and highly designable samples. Left: sampled backbones from $\mathrm{FrameDiff}$. Middle: best ESMFold predictions with high confidence (pLDDT) Right: samples aligned with their closest PDB chain.
Figure 4: Variances schedules for translations and rotations using hyperparameters in \ref{['sec:hyperparameters']}. For rotations, we use a logarithmic $\sigma$ such that the variance decays slower and more closely matches the translation variance schedule.
Figure 5: Designability test. Using $\mathrm{FrameDiff}$, we sample a backbone starting from noise then proceed to sample multiple ($N_{\mathrm{seq}}$) sequences with $\mathrm{ProteinMPNN}$dauparas2022protmpnn. Each sequence is then folded with ESMFold lin2022evolutionary to obtain the predicted backbone which is scored again the sampled backbone with RMSD ($\texttt{scRMSD}$) or TM-score ($\texttt{scTM}$). This framework also gives a method for generating a full protein with sequence and sidechains starting from a generated backbone.
...and 2 more figures

Theorems & Definitions (54)

Proposition 2.1: Time-reversal, debortoli2022Riemannian
Proposition 3.1: Metric on $\mathrm{SE}(3)$
Proposition 3.2: Brownian motion on compact Lie groups
Proposition 3.3: Brownian motion on $\mathrm{SO}(3)$
Proposition 3.4: Score on $\mathrm{SO}(3)$
Proposition 3.5: Disintegration of measures on $\mathrm{SE}(3)^N$
Proposition 3.6: $G$-invariance and SDEs
Corollary 3.7
Proposition 3.1
proof
...and 44 more

SE(3) diffusion model with application to protein backbone generation

TL;DR

Abstract

SE(3) diffusion model with application to protein backbone generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (54)