Table of Contents
Fetching ...

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

Haitao Lin, Yufei Huang, Odin Zhang, Siqi Ma, Meng Liu, Xuanjing Li, Lirong Wu, Jishui Wang, Tingjun Hou, Stan Z. Li

TL;DR

DiffBP tackles the challenge of designing molecules that bind to a target protein by modeling the full-atom joint distribution with a diffusion-based, non-autoregressive framework conditioned on the protein pocket. It employs SE(3)-equivariant graph denoisers (EGNN/GVP) and a forward-backward diffusion process for both continuous atom coordinates and discrete atom types, including a zero center-of-mass trick to manage translations. Key contributions include (i) a target-aware diffusion process, (ii) an SE(3)-equivariant denoiser, (iii) a multi-term optimization objective (L_pos, L_type, L_reg, L_rec), and (iv) pre-generation strategies for molecule size and CoM. Empirical results on CrossDocked2020 show competitive affinity and drug-like properties, with insights into molecule size distributions and sub-structure patterns, highlighting DiffBP as a physics-consistent alternative to autoregressive SBDD methods.

Abstract

Generating molecules that bind to specific proteins is an important but challenging task in drug discovery. Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one. However, in real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled among atoms. With such energy-based consideration, the modeling of probability should be based on joint distributions, rather than sequentially conditional ones. Thus, the unnatural sequentially auto-regressive modeling of molecule generation is likely to violate the physical rules, thus resulting in poor properties of the generated molecules. In this work, a generative diffusion model for molecular 3D structures based on target proteins as contextual constraints is established, at a full-atom level in a non-autoregressive way. Given a designated 3D protein binding site, our model learns the generative process that denoises both element types and 3D coordinates of an entire molecule, with an equivariant network. Experimentally, the proposed method shows competitive performance compared with prevailing works in terms of high affinity with proteins and appropriate molecule sizes as well as other drug properties such as drug-likeness of the generated molecules.

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

TL;DR

DiffBP tackles the challenge of designing molecules that bind to a target protein by modeling the full-atom joint distribution with a diffusion-based, non-autoregressive framework conditioned on the protein pocket. It employs SE(3)-equivariant graph denoisers (EGNN/GVP) and a forward-backward diffusion process for both continuous atom coordinates and discrete atom types, including a zero center-of-mass trick to manage translations. Key contributions include (i) a target-aware diffusion process, (ii) an SE(3)-equivariant denoiser, (iii) a multi-term optimization objective (L_pos, L_type, L_reg, L_rec), and (iv) pre-generation strategies for molecule size and CoM. Empirical results on CrossDocked2020 show competitive affinity and drug-like properties, with insights into molecule size distributions and sub-structure patterns, highlighting DiffBP as a physics-consistent alternative to autoregressive SBDD methods.

Abstract

Generating molecules that bind to specific proteins is an important but challenging task in drug discovery. Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one. However, in real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled among atoms. With such energy-based consideration, the modeling of probability should be based on joint distributions, rather than sequentially conditional ones. Thus, the unnatural sequentially auto-regressive modeling of molecule generation is likely to violate the physical rules, thus resulting in poor properties of the generated molecules. In this work, a generative diffusion model for molecular 3D structures based on target proteins as contextual constraints is established, at a full-atom level in a non-autoregressive way. Given a designated 3D protein binding site, our model learns the generative process that denoises both element types and 3D coordinates of an entire molecule, with an equivariant network. Experimentally, the proposed method shows competitive performance compared with prevailing works in terms of high affinity with proteins and appropriate molecule sizes as well as other drug properties such as drug-likeness of the generated molecules.
Paper Structure (47 sections, 33 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 47 sections, 33 equations, 3 figures, 7 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overall framework as an illustration of the workflows of DiffBP.
  • Figure 2: Visualization on generation process two molecules (Affinity score = $4.583$ and $5.682$) which are binding to the protein '1afs$\_$A$\_$rec' and '4azf$\_$A$\_$rec' respectively.
  • Figure 3: Validity of generated samples v.s.$\rho$ and $\gamma$