Table of Contents
Fetching ...

Diffusion Models in $\textit{De Novo}$ Drug Design

Amira Alakhdar, Barnabas Poczos, Newell Washburn

TL;DR

This review surveys diffusion-model frameworks for 3D molecular generation in de novo drug design, emphasizing forward/reverse diffusion, E(3)/SE(3) equivariance, and representation choices (SMILES, 2D/3D graphs). It catalogs DDPMs, SGMs, and Score SDEs, detailing their training losses and sampling approaches, and maps a spectrum of reverse denoising architectures from GNNs and CNNs to transformers and hybrids. It highlights structure-aware conditioning for structure-based drug design, reviews diverse applications (fragment/linker design, conformations, docking, MD), and evaluates performance with a broad set of metrics, while acknowledging limitations in chirality handling, data availability, benchmarking, and computational cost. The article concludes that diffusion models have substantial potential to accelerate drug discovery, provided challenges around data, evaluation consistency, and physical validity are addressed. Overall, the work clarifies technical pathways and practical considerations for applying diffusion models to 3D molecular design in pharma.

Abstract

Diffusion models have emerged as powerful tools for molecular generation, particularly in the context of 3D molecular structures. Inspired by non-equilibrium statistical physics, these models can generate 3D molecular structures with specific properties or requirements crucial to drug discovery. Diffusion models were particularly successful at learning 3D molecular geometries' complex probability distributions and their corresponding chemical and physical properties through forward and reverse diffusion processes. This review focuses on the technical implementation of diffusion models tailored for 3D molecular generation. It compares the performance, evaluation methods, and implementation details of various diffusion models used for molecular generation tasks. We cover strategies for atom and bond representation, architectures of reverse diffusion denoising networks, and challenges associated with generating stable 3D molecular structures. This review also explores the applications of diffusion models in $\textit{de novo}$ drug design and related areas of computational chemistry, such as structure-based drug design, including target-specific molecular generation, molecular docking, and molecular dynamics of protein-ligand complexes. We also cover conditional generation on physical properties, conformation generation, and fragment-based drug design. By summarizing the state-of-the-art diffusion models for 3D molecular generation, this review sheds light on their role in advancing drug discovery as well as their current limitations.

Diffusion Models in $\textit{De Novo}$ Drug Design

TL;DR

This review surveys diffusion-model frameworks for 3D molecular generation in de novo drug design, emphasizing forward/reverse diffusion, E(3)/SE(3) equivariance, and representation choices (SMILES, 2D/3D graphs). It catalogs DDPMs, SGMs, and Score SDEs, detailing their training losses and sampling approaches, and maps a spectrum of reverse denoising architectures from GNNs and CNNs to transformers and hybrids. It highlights structure-aware conditioning for structure-based drug design, reviews diverse applications (fragment/linker design, conformations, docking, MD), and evaluates performance with a broad set of metrics, while acknowledging limitations in chirality handling, data availability, benchmarking, and computational cost. The article concludes that diffusion models have substantial potential to accelerate drug discovery, provided challenges around data, evaluation consistency, and physical validity are addressed. Overall, the work clarifies technical pathways and practical considerations for applying diffusion models to 3D molecular design in pharma.

Abstract

Diffusion models have emerged as powerful tools for molecular generation, particularly in the context of 3D molecular structures. Inspired by non-equilibrium statistical physics, these models can generate 3D molecular structures with specific properties or requirements crucial to drug discovery. Diffusion models were particularly successful at learning 3D molecular geometries' complex probability distributions and their corresponding chemical and physical properties through forward and reverse diffusion processes. This review focuses on the technical implementation of diffusion models tailored for 3D molecular generation. It compares the performance, evaluation methods, and implementation details of various diffusion models used for molecular generation tasks. We cover strategies for atom and bond representation, architectures of reverse diffusion denoising networks, and challenges associated with generating stable 3D molecular structures. This review also explores the applications of diffusion models in drug design and related areas of computational chemistry, such as structure-based drug design, including target-specific molecular generation, molecular docking, and molecular dynamics of protein-ligand complexes. We also cover conditional generation on physical properties, conformation generation, and fragment-based drug design. By summarizing the state-of-the-art diffusion models for 3D molecular generation, this review sheds light on their role in advancing drug discovery as well as their current limitations.
Paper Structure (43 sections, 18 equations, 6 figures, 3 tables)

This paper contains 43 sections, 18 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the process of generating molecules using diffusion models. First, the relevant dataset is acquired, the molecules are expressed in an appropriate molecular representation, and diffusion conditions are determined. Next, the diffusion framework (DDPM, SGM, Score SDE) is selected, and the forward and reverse diffusion strategies are designed. Denoising architectures may include transformers, GNNs, CNNs, and hybrid architectures. The output results are obtained, and the generated molecules are evaluated using multiple evaluation metrics according to the specific task in the drug discovery process [AfterPang_Qiao_Zeng_Zou_Wei_2023].
  • Figure 2: Overview of the diffusion process applied to 3D molecules. In the forward diffusion process, noise is added gradually to molecules by sampling from the distribution $\mathbf{q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t \mathbb{I})}$ where $\mathbf{\beta_t \in (0, 1)}$ is a hyperparameter specified before model training, $\mathbf{\mathbb{I}}$ is the identity matrix and $\mathbf{t \in \{1, 2, \dots, T\}}$ is the time step. To generate molecules, starting from standard normal noise $\mathbf{x_T}$, samples are drawn from the distributions $\mathbf{p_{\theta}(x_{t-1}|x_t)}$ iteratively. Those distributions are learned by the pretrained denoising neural networks.
  • Figure 3: Overview of latent space diffusion process. First, the molecules are encoded to a continuous latent space, then stable diffusion is applied on the latent space. To generate molecules, they are first sampled from the latent space, then retrieved to the original discrete space using the decoder [After xu2023geometric].
  • Figure 4: Simple illustrations of the neural network architectures commonly used in reverse diffusion: A- A-transformers, B-GNNs, and C-CNNs (in 3D).
  • Figure 5: Generation of 3D molecules conditioned on protein pocket using diffusion models.
  • ...and 1 more figures