Improved motif-scaffolding with SE(3) flow matching
Jason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank Noé, Regina Barzilay, Tommi S. Jaakkola
TL;DR
This work extends the SE(3) flow matching framework FrameFlow to motif-scaffolding through two complementary strategies: motif amortization, which trains a motif-conditioned scaffold generator, and motif guidance, which repurposes an unconditional model with motif-driven trajectory guidance. On a 24-motif benchmark, the approach achieves substantially higher scaffold diversity and up to 2.5x better designability and uniqueness compared with state-of-the-art diffusion-based methods, while remaining faster to sample. The authors also introduce a data-augmentation scheme to simulate motif–scaffold pairings from unlabeled PDB data, enabling robust generalization to novel motifs. Overall, the method demonstrates that diversity-aware motif-scaffolding is feasible with SE(3) flow matching and provides practical pathways for more reliable wet-lab validation. The work positions FrameFlow as a lighter, faster alternative to heavier diffusion models with competitive designability and notably improved scaffold diversity, paving the way for broader motif-based protein design tasks, including potential extension to binders and enzymes.
Abstract
Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: https://github.com/microsoft/protein-frame-flow
