Floating Anchor Diffusion Model for Multi-motif Scaffolding

Ke Liu; Weian Mao; Shuaike Shen; Xiaoran Jiao; Zheng Sun; Hao Chen; Chunhua Shen

Floating Anchor Diffusion Model for Multi-motif Scaffolding

Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

TL;DR

FADiff is the first work to tackle the challenge of scaffolding multiple motifs without relying on the expertise of relative motif positions in the protein, and guarantees the presence of motifs and automates the motif position design.

Abstract

Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However, prior knowledge of the relative motif positions in a protein is not readily available, and constructing a protein with multiple functions in one protein is more general and significant because of the synergies between functions. We propose a Floating Anchor Diffusion (FADiff) model. FADiff allows motifs to float rigidly and independently in the process of diffusion, which guarantees the presence of motifs and automates the motif position design. Our experiments demonstrate the efficacy of FADiff with high success rates and designable novel scaffolds. To the best of our knowledge, FADiff is the first work to tackle the challenge of scaffolding multiple motifs without relying on the expertise of relative motif positions in the protein. Code is available at https://github.com/aim-uofa/FADiff.

Floating Anchor Diffusion Model for Multi-motif Scaffolding

TL;DR

Abstract

Paper Structure (62 sections, 41 equations, 12 figures, 6 tables)

This paper contains 62 sections, 41 equations, 12 figures, 6 tables.

Introduction
Related Works
Motif scaffolding problem
Generative models for scaffolding motifs
Preliminaries and Notation
The multi-motif scaffolding problem
Backbone parameterization
Diffusion model for protein backbone generation
Additional notations
Floating Anchor Diffusion
Forward diffuse the protein backbone
Rotation
Translation
Denoising score matching
Frame update
...and 47 more sections

Figures (12)

Figure 1: In the denoising process, we keep the motifs translating and rotating rigidly, which means the internal structure of motifs is maintained while their positions in the protein are flexible. The orange and blue colors indicate the anchor motifs that float rigidly. The green color indicates the scaffold residues. The coordinate system in color denotes the virtual coordinate system, which is the geometry center of each motif.
Figure 2: A) Given multiple motifs with their internal structure ${\mathcal{M}}_{\rm X}$ and ${\mathcal{M}}_{\mathcal{A}}$. We specify the sequence position of residues by finding the shortest chain with a greedy algorithm like the traveling salesman problem (TSP), where the distances are the gaps between the atoms C and N of two residues. In both the forward process and reverse process, we take the motifs as rigid and enable them to float rigidly. B) Generally, we reform the noise and updates for each motif. C) The preprocess of TSP. The orange and blue colors indicate the motifs. The residues in green are generated scaffolds.
Figure 3: Statistic analysis and visualization of generation results for scaffolding 3, 4, 5, and two huge domains of length more than 100 residues. A) scTM distribution. The samples over the red dashed line are designable. 59.18%, 46.00%, 36.15%, and 60.00% generated protein structures are designable for scaffolding 3, 4, 5, and two huge domains. B) Generated protein structures. The green colors indicate the generated scaffolds and the other colors indicate the motifs. The numbers below each generated structure indicate the scTM score.
Figure 4: A) The distribution of scTM for inference results with varying lengths. The samples above the red dashed line are designable. B) visualization results were obtained by training the model on two motifs and testing on five motifs. 'ESMFold' denotes protein structures constructed through the ProteinMPNN and ESMFold, with a preference for closer structural resemblance. Our model demonstrates generalization capability.
Figure 5: Illustration of notations
...and 7 more figures

Theorems & Definitions (5)

Definition 3.1: Protein structure
Definition 3.2: Scaffolded motif and scaffold
Remark 3.3: Motif-scaffolding by inpainting
Remark 3.4: Motif-scaffolding by conditional generation
Definition 3.5: Multiple motif scaffolding problem

Floating Anchor Diffusion Model for Multi-motif Scaffolding

TL;DR

Abstract

Floating Anchor Diffusion Model for Multi-motif Scaffolding

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (5)