Protein structure generation via folding diffusion
Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini
TL;DR
This work introduces FoldingDiff, a diffusion-based model that directly generates protein backbone structures by diffusing over a six-angle internal representation per residue, removing the need for equivariant 3D coordinates. Using a vanilla transformer as a denoiser and a wrapped-normal forward process, the model learns to produce realistic angle configurations that recapitulate native angle distributions and Ramachandran patterns, including chirality. Evaluation shows that approximately 22.7% of generated backbones are designable (scTM ≥ 0.5) with consistent diversity across replicates, and a strong reconstruction accuracy (>0.95 TM-score on average) indicates faithful geometry. The work provides substantial open-source resources and highlights a scalable, biologically-inspired path for de novo protein design, while outlining avenues to extend length, complexity, and functional integration.
Abstract
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a new diffusion-based generative model that designs protein backbone structures via a procedure that mirrors the native folding process. We describe protein backbone structure as a series of consecutive angles capturing the relative orientation of the constituent amino acid residues, and generate new structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins biologically twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release the first open-source codebase and trained models for protein structure diffusion.
