Table of Contents
Fetching ...

Structure-based drug design by denoising voxel grids

Pedro O. Pinheiro, Arian Jamasb, Omar Mahmood, Vishnu Sresht, Saeed Saremi

TL;DR

VoxBind presents a voxel-based, structure-conditioned 3D molecule generator for SBDD by extending neural empirical Bayes to the conditional setting and applying conditional walk-jump sampling. Ligands are voxelized densities conditioned on protein pockets, with a conditional denoiser within a 3D U-Net to estimate clean ligands from noisy samples; sampling uses a decoupled Langevin walk and Bayes-estimated jumps for efficiency. Across CrossDocked2020, VoxBind achieves higher binding affinity, better drug-likeness properties, lower steric strain, fewer clashes, and substantially faster sampling than state-of-the-art point-cloud diffusion baselines. The approach demonstrates that voxel representations combined with score-based denoising can rival or surpass current conditional 3D molecule generation methods while simplifying training and accelerating sampling. This supports more scalable pocket-directed design in structure-based drug discovery and enables flexible initialization strategies for practical drug design workflows.

Abstract

We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-step procedure: (i) sample noisy molecules from the Gaussian-smoothed conditional distribution with underdamped Langevin MCMC using the learned score function and (ii) estimate clean molecules from the noisy samples with single-step denoising. Compared to the current state of the art, our model is simpler to train, significantly faster to sample from, and achieves better results on extensive in silico benchmarks -- the generated molecules are more diverse, exhibit fewer steric clashes, and bind with higher affinity to protein pockets. The code is available at https://github.com/genentech/voxbind/.

Structure-based drug design by denoising voxel grids

TL;DR

VoxBind presents a voxel-based, structure-conditioned 3D molecule generator for SBDD by extending neural empirical Bayes to the conditional setting and applying conditional walk-jump sampling. Ligands are voxelized densities conditioned on protein pockets, with a conditional denoiser within a 3D U-Net to estimate clean ligands from noisy samples; sampling uses a decoupled Langevin walk and Bayes-estimated jumps for efficiency. Across CrossDocked2020, VoxBind achieves higher binding affinity, better drug-likeness properties, lower steric strain, fewer clashes, and substantially faster sampling than state-of-the-art point-cloud diffusion baselines. The approach demonstrates that voxel representations combined with score-based denoising can rival or surpass current conditional 3D molecule generation methods while simplifying training and accelerating sampling. This supports more scalable pocket-directed design in structure-based drug discovery and enables flexible initialization strategies for practical drug design workflows.

Abstract

We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-step procedure: (i) sample noisy molecules from the Gaussian-smoothed conditional distribution with underdamped Langevin MCMC using the learned score function and (ii) estimate clean molecules from the noisy samples with single-step denoising. Compared to the current state of the art, our model is simpler to train, significantly faster to sample from, and achieves better results on extensive in silico benchmarks -- the generated molecules are more diverse, exhibit fewer steric clashes, and bind with higher affinity to protein pockets. The code is available at https://github.com/genentech/voxbind/.
Paper Structure (33 sections, 1 theorem, 12 equations, 16 figures, 2 tables, 1 algorithm)

This paper contains 33 sections, 1 theorem, 12 equations, 16 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Given the noise process eq:noise-factorization and eq:gauss, the conditional Bayes estimator eq:conditonal-bayes-estimator can be written in closed form in terms of the conditional score function eq:conditional-score:

Figures (16)

  • Figure 1: We are interested in sampling from $p(x|\xi)$, the distribution of ligands given pocket $\xi$. This is challenging due to the high-dimensionality of the data. Therefore, instead of (a) sampling directly from this distribution, we generate ligands in a two-step procedure: (b) sample $y$ from the Gaussian-smoothed distribution $p(y|\xi)$ and (c) estimate the ligand $\hat{x}$ from $y$ and $\xi$.
  • Figure 2: Conditional denoiser architecture. Given a ligand-pocket complex sample, we discretize each molecule resulting in the voxelized ligand $x$ and pocket $\xi$. The ligand is corrupted by Gaussian noise with noise level $\sigma$. The corrupted ligand and pocket are encoded into a common embedding space (with the same spatial dimensions as the inputs) with encoders $E_{\rm lig}$ and $E_{\rm poc}$, respectively. The two representations are added together and forwarded through a 3D U-Net $U$ to recover the clean version of the ligand. To facilitate visualization, we threshold the grid values, $\hat{x}=\mathbbm{1}_{\ge.1}(\hat{x})$.
  • Figure 3: Illustration of pocket-conditional walk-jump sampling chain. (a) First, we voxelize a given protein binding pocket. (b) Then, we sample noisy voxelized ligands (given the pocket) with Langevin MCMC and estimate clean samples with the estimator. (c) Finally, we recover the atomic coordinates from voxel grids. In this figure, jumps are done at every $\Delta k=100$ walk steps.
  • Figure 4: Example of generated ligands $\hat{x}_{i,k}$ given pocket $\xi_i$. Each row represents a single chain of samples for a given protein pocket (1E3R, 2I2Z, 5CRZ from top to bottom). For each generated sample, we show the ligand-pocket complex and the generated voxelized molecule. The samples from each row are generated from the same MCMC chain. The provided ground-truth ligands are shown on the last column.
  • Figure 5: Median VinaScore and VinaDock (score on generated and redocked poses, respectively) of all generated molecules for each target on the test set (lower is better). Pockets are sorted by VoxBind$_{\sigma=0.9}$'s score.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof