Table of Contents
Fetching ...

DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, Quanquan Gu

TL;DR

This paper decomposes the ligand molecule into two parts, namely arms and scaffold, and proposes a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold with decomposed priors over arms and scaffold in order to facilitate the decomposed generation and improve the properties of the generated molecules.

Abstract

Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. In order to facilitate the decomposed generation and improve the properties of the generated molecules, we incorporate both bond diffusion in the model and additional validity guidance in the sampling phase. Extensive experiments on CrossDocked2020 show that our approach achieves state-of-the-art performance in generating high-affinity molecules while maintaining proper molecular properties and conformational stability, with up to -8.39 Avg. Vina Dock score and 24.5 Success Rate. The code is provided at https://github.com/bytedance/DecompDiff

DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

TL;DR

This paper decomposes the ligand molecule into two parts, namely arms and scaffold, and proposes a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold with decomposed priors over arms and scaffold in order to facilitate the decomposed generation and improve the properties of the generated molecules.

Abstract

Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. In order to facilitate the decomposed generation and improve the properties of the generated molecules, we incorporate both bond diffusion in the model and additional validity guidance in the sampling phase. Extensive experiments on CrossDocked2020 show that our approach achieves state-of-the-art performance in generating high-affinity molecules while maintaining proper molecular properties and conformational stability, with up to -8.39 Avg. Vina Dock score and 24.5 Success Rate. The code is provided at https://github.com/bytedance/DecompDiff
Paper Structure (32 sections, 1 theorem, 31 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 1 theorem, 31 equations, 8 figures, 8 tables, 1 algorithm.

Key Result

Proposition 3.1

Let $-\text{\rm ELBO}_{\text{\rm decomp}}({\bm{\theta}})$ and $-\text{\rm ELBO}_{\text{\rm standard}}({\bm{\theta}})$ denote the $-\text{\rm ELBO}$ losses under the decomposed prior and the standard Gaussian prior respectively. Suppose that $f_{\bm{\theta}}$ is a simple graph neural network with a

Figures (8)

  • Figure 1: Ligand molecules can be decomposed into arms and scaffold. Using MDM2 as an example, small molecule ligands are collected and displayed in the upper panel (colors represent different atom types). The ligand atoms are separated into arms and scaffold based on their distance to the protein surface. Arms (lower right) form direct contact with the target, while scaffold (lower left) connects the arms together. Arm atoms are further clustered based on their positions, and the cluster (colored atom groups) show strong shape complementarity with local subpockets.
  • Figure 2: Overview of the sampling process of DecompDiff. (a) The initial atoms are sampled from informative decomposed priors. (b) An equivariant network on heterogeneous graphs denoises atom coordinates, atom types and bond types simultaneously. (c) The validity guidance alleviates the protein-ligand clash problem and encourages arms and scaffold to connect.
  • Figure 3: Comparing the distribution for distances of carbon-carbon pairs (left) and all-atom (right) for reference molecules in the test set and model-generated molecules. Jensen-Shannon divergence (JSD) between two distributions is reported.
  • Figure 4: Ablation study on diffusion step number. We compare our models with TargetDiff guan20233d in terms of best validation loss, QED, SA, Vina Score under different diffusion step number settings.
  • Figure 5: Visualization of reference binding molecules (left column), molecules generated by TargetDiff guan20233d (middle column), and our model (right column) with only $200$ sampling steps on protein 4H3C (top row) and 2F2C (bottom row).
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 3.1