Table of Contents
Fetching ...

On diffusion posterior sampling via sequential Monte Carlo for zero-shot scaffolding of protein motifs

James Matthew Young, O. Deniz Akyildiz

TL;DR

This work reframes motif scaffolding as an inverse problem solved with diffusion posterior sampling (DPS) under a zero-shot, unconditional backbone prior. It introduces a family of guidance potentials (e.g., $L_{\text{dist}}$, $L_{\text{framedist}}$, $L_{\text{fape}}$, $L_{\text{rmsd}}$) and extends to multi-motif and symmetry-constrained generation, enabling SE(3)-aware design without motif-conditioned retraining. A systematic comparison of SMC-based samplers (including replacement and reconstruction-guided variants) shows that reconstruction guidance paired with DPS often yields strong performance, with some potentials matching or exceeding masking-based methods in single-motif tasks and enabling zero-shot multi-motif scaffolding. The results demonstrate practical zero-shot design of designable, motif-containing proteins and provide a reusable, model-agnostic framework for future protein engineering with unconditional backbone models, complemented by open-source code.

Abstract

With the advent of diffusion models, new proteins can be generated at an unprecedented rate. The motif scaffolding problem requires steering this generative process to yield proteins with a desirable functional substructure called a motif. While models have been trained to take the motif as conditional input, recent techniques in diffusion posterior sampling can be leveraged as zero-shot alternatives whose approximations can be corrected with sequential Monte Carlo (SMC) algorithms. In this work, we introduce a new set of guidance potentials for describing scaffolding tasks and solve them by adapting SMC-aided diffusion posterior samplers with an unconditional model, Genie, as a prior. In single motif problems, we find that (i) the proposed potentials perform comparably, if not better, than the conventional masking approach, (ii) samplers based on reconstruction guidance outperform their replacement method counterparts, and (iii) measurement tilted proposals and twisted targets improve performance substantially. Furthermore, as a demonstration, we provide solutions to two multi-motif problems by pairing reconstruction guidance with an SE(3)-invariant potential. We also produce designable internally symmetric monomers with a guidance potential for point symmetry constraints. Our code is available at: https://github.com/matsagad/mres-project.

On diffusion posterior sampling via sequential Monte Carlo for zero-shot scaffolding of protein motifs

TL;DR

This work reframes motif scaffolding as an inverse problem solved with diffusion posterior sampling (DPS) under a zero-shot, unconditional backbone prior. It introduces a family of guidance potentials (e.g., , , , ) and extends to multi-motif and symmetry-constrained generation, enabling SE(3)-aware design without motif-conditioned retraining. A systematic comparison of SMC-based samplers (including replacement and reconstruction-guided variants) shows that reconstruction guidance paired with DPS often yields strong performance, with some potentials matching or exceeding masking-based methods in single-motif tasks and enabling zero-shot multi-motif scaffolding. The results demonstrate practical zero-shot design of designable, motif-containing proteins and provide a reusable, model-agnostic framework for future protein engineering with unconditional backbone models, complemented by open-source code.

Abstract

With the advent of diffusion models, new proteins can be generated at an unprecedented rate. The motif scaffolding problem requires steering this generative process to yield proteins with a desirable functional substructure called a motif. While models have been trained to take the motif as conditional input, recent techniques in diffusion posterior sampling can be leveraged as zero-shot alternatives whose approximations can be corrected with sequential Monte Carlo (SMC) algorithms. In this work, we introduce a new set of guidance potentials for describing scaffolding tasks and solve them by adapting SMC-aided diffusion posterior samplers with an unconditional model, Genie, as a prior. In single motif problems, we find that (i) the proposed potentials perform comparably, if not better, than the conventional masking approach, (ii) samplers based on reconstruction guidance outperform their replacement method counterparts, and (iii) measurement tilted proposals and twisted targets improve performance substantially. Furthermore, as a demonstration, we provide solutions to two multi-motif problems by pairing reconstruction guidance with an SE(3)-invariant potential. We also produce designable internally symmetric monomers with a guidance potential for point symmetry constraints. Our code is available at: https://github.com/matsagad/mres-project.

Paper Structure

This paper contains 45 sections, 47 equations, 13 figures, 5 tables, 2 algorithms.

Figures (13)

  • Figure 1: Different motif scaffolding tasks. The motif in blue is contiguous, and the other in red is discontiguous. Scaffolds are illustrated in white.
  • Figure 2: An overview of the motif scaffolding experimental setup. Protein backbones are first sampled from the conditional setup with the motif as an observation. These generated structures are then inverse-folded with the motif sequence fixed and folded back into structures. Finally, metrics such as self-consistency RMSD and motif RMSD are computed between the predicted structure and both the generated structure and the motif.
  • Figure 3: (A) Performance of sampling methods on the 24 motif scaffolding benchmarks. Thirty-two backbones are sampled from each method across all the motif problems. Scaffolds that are successful and those which meet at least one of the main success criteria are reported according to their unique count. (B) Examples of the designed scaffolds. The motif, in grey, is aligned with the scaffold, in white. Most unsuccessful scaffolds either do not possess the motif in full or have poor self-consistency.
  • Figure 4: (A) Success metrics of sampling methods on the six multi-motif scaffolding benchmarks. Thirty-two scaffolds are sampled for each problem. Values for a pass in each criterion are denoted by the dashed line. Error bars shown are one standard deviation from the mean. Only samples with correct handedness were considered. (B) Examples of successful designs from the 512 samples generated via TDS-rmsd. The motifs, in colour, are aligned with the scaffold, in white.
  • Figure 5: (A) Designability of symmetric designs across several point symmetries. Sixteen scaffolds with a maximum of 128 and 256 residues were sampled for each symmetry through FPSSMC and TDS-mask. The total number of designable scaffolds is dashed atop the unique count. The success threshold for scRMSD is indicated by the dashed line. (B) Examples of the successfully designed scaffolds. The first and second rows show designs with a maximum of 128 and 256 residues, respectively. The primary axis of symmetry points directly outwards of the page.
  • ...and 8 more figures