Table of Contents
Fetching ...

3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction

Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, Jianzhu Ma

TL;DR

TargetDiff presents a SE(3)-equivariant diffusion model that non-autoregressively generates 3D coordinates and discrete atom types for target-aware molecule design conditioned on protein pockets. By modeling both coordinates and atom types with an SE(3)-equivariant GNN, it achieves realistic 3D structures and improved binding energies on CrossDocked2020, while aligning training and sampling procedures. The framework also derives unsupervised affinity signals from the denoised features to enhance binding affinity ranking and prediction, boosting performance on downstream tasks like PDBBind. Overall, TargetDiff advances realistic 3D drug design by jointly handling geometry, chemistry, and target context in a scalable, equivariant diffusion setting.

Abstract

Rich data and powerful machine learning models allow us to design drugs for a specific protein target \textit{in silico}. Recently, the inclusion of 3D structures during targeted drug design shows superior performance to other target-free models as the atomic interaction in the 3D space is explicitly modeled. However, current 3D target-aware models either rely on the voxelized atom densities or the autoregressive sampling process, which are not equivariant to rotation or easily violate geometric constraints resulting in unrealistic structures. In this work, we develop a 3D equivariant diffusion model to solve the above challenges. To achieve target-aware molecule design, our method learns a joint generative process of both continuous atom coordinates and categorical atom types with a SE(3)-equivariant network. Moreover, we show that our model can serve as an unsupervised feature extractor to estimate the binding affinity under proper parameterization, which provides an effective way for drug screening. To evaluate our model, we propose a comprehensive framework to evaluate the quality of sampled molecules from different dimensions. Empirical studies show our model could generate molecules with more realistic 3D structures and better affinities towards the protein targets, and improve binding affinity ranking and prediction without retraining.

3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction

TL;DR

TargetDiff presents a SE(3)-equivariant diffusion model that non-autoregressively generates 3D coordinates and discrete atom types for target-aware molecule design conditioned on protein pockets. By modeling both coordinates and atom types with an SE(3)-equivariant GNN, it achieves realistic 3D structures and improved binding energies on CrossDocked2020, while aligning training and sampling procedures. The framework also derives unsupervised affinity signals from the denoised features to enhance binding affinity ranking and prediction, boosting performance on downstream tasks like PDBBind. Overall, TargetDiff advances realistic 3D drug design by jointly handling geometry, chemistry, and target context in a scalable, equivariant diffusion setting.

Abstract

Rich data and powerful machine learning models allow us to design drugs for a specific protein target \textit{in silico}. Recently, the inclusion of 3D structures during targeted drug design shows superior performance to other target-free models as the atomic interaction in the 3D space is explicitly modeled. However, current 3D target-aware models either rely on the voxelized atom densities or the autoregressive sampling process, which are not equivariant to rotation or easily violate geometric constraints resulting in unrealistic structures. In this work, we develop a 3D equivariant diffusion model to solve the above challenges. To achieve target-aware molecule design, our method learns a joint generative process of both continuous atom coordinates and categorical atom types with a SE(3)-equivariant network. Moreover, we show that our model can serve as an unsupervised feature extractor to estimate the binding affinity under proper parameterization, which provides an effective way for drug screening. To evaluate our model, we propose a comprehensive framework to evaluate the quality of sampled molecules from different dimensions. Empirical studies show our model could generate molecules with more realistic 3D structures and better affinities towards the protein targets, and improve binding affinity ranking and prediction without retraining.
Paper Structure (41 sections, 1 theorem, 22 equations, 16 figures, 1 table, 2 algorithms)

This paper contains 41 sections, 1 theorem, 22 equations, 16 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Denoting the SE(3)-transformation as $T_g$, we could achieve invariant likelihood w.r.t $T_g$ on the protein-ligand complex: $p_\theta(T_g({{M}}_0 | {\mathcal{P}})) = p_\theta({{M}}_0 | {\mathcal{P}})$ if we shift the Center of Mass (CoM) of protein atoms to zero and parameterize the Markov transiti

Figures (16)

  • Figure 1: Overview of TargetDiff. The diffusion process gradually injects noise to the data, and the generative process learns to recover the data distribution from the noise distribution with a network parameterized by $\theta$.
  • Figure 2: Comparing the distribution for distances of all-atom (top row) and carbon-carbon pairs (bottom row) for reference molecules in the test set (gray) and model generated molecules (color). Jensen-Shannon divergence (JSD) between two distributions is reported.
  • Figure 3: Jensen-Shannon divergence between the distributions of bond distance for reference vs. generated molecules. "-", "=", and ":" represent single, double, and aromatic bonds, respectively. A lower value is better.
  • Figure 4: Median RMSD for rigid fragment before and after the force-field optimization.
  • Figure 5: Percentage of different ring sizes for reference and model generated molecules.
  • ...and 11 more figures

Theorems & Definitions (1)

  • Proposition 1