Sifting through the Noise: A Survey of Diffusion Probabilistic Models and Their Applications to Biomolecules
Trevor Norton, Debswapna Bhattacharya
TL;DR
The paper surveys diffusion probabilistic frameworks and their biomolecular applications, focusing on solving high‑dimensional, multimodal sampling challenges via forward noising and learned denoising. It unifies three core formalisms—DDPM, NCSN, and score-based SDEs—and discusses their extensions to geometry through SE(3) equivariant architectures and diffusion on manifolds, including torsion and orientation spaces. The review catalogs extensive generative tasks (backbone and sequence design, protein complexes) and predictive tasks (side-chains, structure prediction, inverse folding, docking), highlighting state‑of‑the‑art results (e.g., RFdiffusion and DiffDock) and practical design/prediction tradeoffs. It also outlines remaining challenges, notably limited nucleic acid data and benchmark biases, while pointing to complementary approaches such as Bayesian flow networks and flow matching as potential alternatives in the biomolecular design space.
Abstract
Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
