Table of Contents
Fetching ...

Score-based 3D molecule generation with neural fields

Matthieu Kirchmeyer, Pedro O. Pinheiro, Saeed Saremi

TL;DR

We address unconditional 3D molecule generation by representing molecules as continuous atomic occupancy fields and decoding via a shared neural field conditioned on a per-molecule code. The method, FuncMol, uses a score-based generative pipeline with Neural Empirical Bayes to learn a denoiser and perform walk-jump sampling in latent space, followed by a continuous refinement step to recover atomic coordinates. This yields an all-atom generation framework that is compact, scalable, and capable of handling large molecules such as macrocyclic peptides, with competitive quality on standard benchmarks and at least an order-of-magnitude faster sampling than voxel-based baselines. The approach offers flexibility for auto-encoding/decoding variants and potential extensions to conditional generation and broader field-based molecular design tasks.

Abstract

We introduce a new representation for 3D molecules based on their continuous atomic density fields. Using this representation, we propose a new model based on walk-jump sampling for unconditional 3D molecule generation in the continuous space using neural fields. Our model, FuncMol, encodes molecular fields into latent codes using a conditional neural field, samples noisy codes from a Gaussian-smoothed distribution with Langevin MCMC (walk), denoises these samples in a single step (jump), and finally decodes them into molecular fields. FuncMol performs all-atom generation of 3D molecules without assumptions on the molecular structure and scales well with the size of molecules, unlike most approaches. Our method achieves competitive results on drug-like molecules and easily scales to macro-cyclic peptides, with at least one order of magnitude faster sampling. The code is available at https://github.com/prescient-design/funcmol.

Score-based 3D molecule generation with neural fields

TL;DR

We address unconditional 3D molecule generation by representing molecules as continuous atomic occupancy fields and decoding via a shared neural field conditioned on a per-molecule code. The method, FuncMol, uses a score-based generative pipeline with Neural Empirical Bayes to learn a denoiser and perform walk-jump sampling in latent space, followed by a continuous refinement step to recover atomic coordinates. This yields an all-atom generation framework that is compact, scalable, and capable of handling large molecules such as macrocyclic peptides, with competitive quality on standard benchmarks and at least an order-of-magnitude faster sampling than voxel-based baselines. The approach offers flexibility for auto-encoding/decoding variants and potential extensions to conditional generation and broader field-based molecular design tasks.

Abstract

We introduce a new representation for 3D molecules based on their continuous atomic density fields. Using this representation, we propose a new model based on walk-jump sampling for unconditional 3D molecule generation in the continuous space using neural fields. Our model, FuncMol, encodes molecular fields into latent codes using a conditional neural field, samples noisy codes from a Gaussian-smoothed distribution with Langevin MCMC (walk), denoises these samples in a single step (jump), and finally decodes them into molecular fields. FuncMol performs all-atom generation of 3D molecules without assumptions on the molecular structure and scales well with the size of molecules, unlike most approaches. Our method achieves competitive results on drug-like molecules and easily scales to macro-cyclic peptides, with at least one order of magnitude faster sampling. The code is available at https://github.com/prescient-design/funcmol.
Paper Structure (48 sections, 10 equations, 14 figures, 10 tables, 4 algorithms)

This paper contains 48 sections, 10 equations, 14 figures, 10 tables, 4 algorithms.

Figures (14)

  • Figure 1: (a) a conditional neural field encodes a molecular field $v$ into a low dimensional latent code $z$. (b) using a learned score function $g_\theta$, FuncMol performs sampling in latent space via Langevin MCMC. These codes are decoded back into molecules.
  • Figure 2: Conditional neural field $f_\phi$ using the multiplicative filter network architecture. (a) A latent code $z$ and some coordinates $x$ are given as input to the model that outputs the occupancy field at that location for the corresponding molecule, $f_\phi(x, z)$. (b) The code and coordinates are processed via FiLM layers and Hadamard products. We denote the overall operation at layer $l$ as $H^{(l)}$.
  • Figure 3: Qualitative evaluation on CREMP following grambow2023ringer. Left: Comparison of the bond angles ($\theta_1$, $\theta_2$, $\theta_3$) in each residue and dihedral distributions ($\phi$, $\psi$, $\omega$) for each residue from the reference test set (gray) and the generated samples (blue). KL divergence is calculated as $\text{KL}(\text{test}\mid\mid\text{sampled})$. Right: Ramachandran plots ramachandran1968conformation (colored by density where darker tones represent high density regions).
  • Figure 4: Auto-encoding approach for neural field representation. A voxelized representation of molecule is encoded int the latent space $z$ with a 3D CNN. This representation is then decoded with a conditional MFN for any point $x$ in space.
  • Figure 5: Interpolation in the latent modulation space for different pairs of molecules from GEOM-drugs. Each interpolated codes is protected back to the learned manifold of molecules via a noise/denoise operation. FuncMol produces semantically meaningful patterns in the interpolated space and we observe that molecules close in latent space share similar structure.
  • ...and 9 more figures