Table of Contents
Fetching ...

Sifting through the Noise: A Survey of Diffusion Probabilistic Models and Their Applications to Biomolecules

Trevor Norton, Debswapna Bhattacharya

TL;DR

The paper surveys diffusion probabilistic frameworks and their biomolecular applications, focusing on solving high‑dimensional, multimodal sampling challenges via forward noising and learned denoising. It unifies three core formalisms—DDPM, NCSN, and score-based SDEs—and discusses their extensions to geometry through SE(3) equivariant architectures and diffusion on manifolds, including torsion and orientation spaces. The review catalogs extensive generative tasks (backbone and sequence design, protein complexes) and predictive tasks (side-chains, structure prediction, inverse folding, docking), highlighting state‑of‑the‑art results (e.g., RFdiffusion and DiffDock) and practical design/prediction tradeoffs. It also outlines remaining challenges, notably limited nucleic acid data and benchmark biases, while pointing to complementary approaches such as Bayesian flow networks and flow matching as potential alternatives in the biomolecular design space.

Abstract

Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.

Sifting through the Noise: A Survey of Diffusion Probabilistic Models and Their Applications to Biomolecules

TL;DR

The paper surveys diffusion probabilistic frameworks and their biomolecular applications, focusing on solving high‑dimensional, multimodal sampling challenges via forward noising and learned denoising. It unifies three core formalisms—DDPM, NCSN, and score-based SDEs—and discusses their extensions to geometry through SE(3) equivariant architectures and diffusion on manifolds, including torsion and orientation spaces. The review catalogs extensive generative tasks (backbone and sequence design, protein complexes) and predictive tasks (side-chains, structure prediction, inverse folding, docking), highlighting state‑of‑the‑art results (e.g., RFdiffusion and DiffDock) and practical design/prediction tradeoffs. It also outlines remaining challenges, notably limited nucleic acid data and benchmark biases, while pointing to complementary approaches such as Bayesian flow networks and flow matching as potential alternatives in the biomolecular design space.

Abstract

Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
Paper Structure (32 sections, 23 equations, 6 figures, 6 tables)

This paper contains 32 sections, 23 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of a diffusion process in three dimensions. A molecule, such as a protein or RNA, can be represented by a collection of points in Euclidean space. The diffusion process gradually adds noise until the distribution is approximately normal. A diffusion model learns to denoise, so that samples from a normal distribution can be converted back to samples of the original distribution.
  • Figure 2: Timeline of applications of diffusion models to biomolecules.
  • Figure 3: Sample paths for diffusion in $\mathbb T$ and $\mathrm{SO}(3)$. (top) Diffusion in $\mathbb T$ is frequently applied to the torsion angle of four consecutive particles. Since bond lengths and angles are usually fairly rigid, most of the diversity in conformations can be explained by the torsional angles. Here an example torsional angle is perturbed by the diffusion process. (bottom) Diffusion in $\mathrm{SO}(3)$ can be applied to frame data to perturb the orientation. Here the cube's center is not moved while a rigid rotation is applied.
  • Figure 4: Summary of diffusion application for the generation and prediction of biomolecules. Ellipses are used for structure generation/prediction; rectangles for sequences; and rounded rectangles for co-generation methods. Blue shapes represent monomeric applications while orange shapes represent polymeric/complex applications.
  • Figure 5: The difference between generation tasks and prediction tasks is how samples are to be drawn from the distribution. On the one hand, generation tasks value faithful sampling of the distribution and the sampling of all modes. On the other hand, prediction tasks value only the most likely outcomes from a distribution. The left- and right-hand plots above show this dichonomy between generation and prediction, respectively. Generated samples from the methods are shown in orange stars.
  • ...and 1 more figures