Table of Contents
Fetching ...

Unified all-atom molecule generation with neural fields

Matthieu Kirchmeyer, Pedro O. Pinheiro, Emma Willett, Karolis Martinkus, Joseph Kleinhenz, Emily K. Makowski, Andrew M. Watkins, Vladimir Gligorijevic, Richard Bonneau, Saeed Saremi

TL;DR

FuncBind addresses the fragmentation of structure-based design across molecular modalities by introducing a modality-agnostic all-atom generation framework built on neural-field representations $v:\mathbb{R}^3\rightarrow[0,1]^n$ and a latent score-based generator. It learns a spatial latent map from binder–target complexes and trains a conditional denoiser to generate new density fields conditioned on target structure, modality, and noise level, enabling sampling via diffusion or Walk-Jump Sampling with preconditioning. A novel MCP benchmark (186,685 complexes with many non-canonical amino acids) and in vitro antibody redesign experiments demonstrate competitive in silico performance and tangible experimental validation, including novel binders to rigid epitopes. Overall, FuncBind showcases the potential of a unified, neural-field approach to cross-modality molecular design, with implications for accelerating discovery across small molecules, peptides, and biologics while signaling future work on scaling and additional modalities.

Abstract

Generative models for structure-based drug design are often limited to a specific modality, restricting their broader applicability. To address this challenge, we introduce FuncBind, a framework based on computer vision to generate target-conditioned, all-atom molecules across atomic systems. FuncBind uses neural fields to represent molecules as continuous atomic densities and employs score-based generative models with modern architectures adapted from the computer vision literature. This modality-agnostic representation allows a single unified model to be trained on diverse atomic systems, from small to large molecules, and handle variable atom/residue counts, including non-canonical amino acids. FuncBind achieves competitive in silico performance in generating small molecules, macrocyclic peptides, and antibody complementarity-determining region loops, conditioned on target structures. FuncBind also generated in vitro novel antibody binders via de novo redesign of the complementarity-determining region H3 loop of two chosen co-crystal structures. As a final contribution, we introduce a new dataset and benchmark for structure-conditioned macrocyclic peptide generation. The code is available at https://github.com/prescient-design/funcbind.

Unified all-atom molecule generation with neural fields

TL;DR

FuncBind addresses the fragmentation of structure-based design across molecular modalities by introducing a modality-agnostic all-atom generation framework built on neural-field representations and a latent score-based generator. It learns a spatial latent map from binder–target complexes and trains a conditional denoiser to generate new density fields conditioned on target structure, modality, and noise level, enabling sampling via diffusion or Walk-Jump Sampling with preconditioning. A novel MCP benchmark (186,685 complexes with many non-canonical amino acids) and in vitro antibody redesign experiments demonstrate competitive in silico performance and tangible experimental validation, including novel binders to rigid epitopes. Overall, FuncBind showcases the potential of a unified, neural-field approach to cross-modality molecular design, with implications for accelerating discovery across small molecules, peptides, and biologics while signaling future work on scaling and additional modalities.

Abstract

Generative models for structure-based drug design are often limited to a specific modality, restricting their broader applicability. To address this challenge, we introduce FuncBind, a framework based on computer vision to generate target-conditioned, all-atom molecules across atomic systems. FuncBind uses neural fields to represent molecules as continuous atomic densities and employs score-based generative models with modern architectures adapted from the computer vision literature. This modality-agnostic representation allows a single unified model to be trained on diverse atomic systems, from small to large molecules, and handle variable atom/residue counts, including non-canonical amino acids. FuncBind achieves competitive in silico performance in generating small molecules, macrocyclic peptides, and antibody complementarity-determining region loops, conditioned on target structures. FuncBind also generated in vitro novel antibody binders via de novo redesign of the complementarity-determining region H3 loop of two chosen co-crystal structures. As a final contribution, we introduce a new dataset and benchmark for structure-conditioned macrocyclic peptide generation. The code is available at https://github.com/prescient-design/funcbind.

Paper Structure

This paper contains 35 sections, 8 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Neural field architectures. (a) Architecture used in kirchmeyer2024funcmol where a global embedding is used as input to the neural field decoder. (b) Our proposed neural field architecture, where embeddings are spatially arranged into a feature map grid. The latter allows us to better capture local signal information from input space and is compatible with expressive architectures for denoising.
  • Figure 2: Conditional denoiser training overview. We voxelize separately the binder $v$ and the target $v_{\rm tar}$ of a given complex and encode them into $z, {z^{\rm tar}}$ using encoders $E_\psi, E_{\psi^\prime}$, respectively. We train a denoiser $\hat{z}_\theta(y\mid{z^{\rm tar}},\sigma,c)$ to remove the noise from $y$ conditioned on $z_{\rm tar}$, the noise level $\sigma$ and the one-hot modality class $c$ (e.g. a cyclic peptide). The denoised latent representation is fed into a neural field decoder $D_\phi$; this gives a reconstructed field $\hat{v}$. $\hat{v}$ undergoes some additional postprocessing to recover the bonds and residue identities (if applicable); see \ref{['sec:openbabel']}.
  • Figure 3: Examples of generated molecules given a target structure for different modalities: (top) small molecules against 2rma, (middle) macrocyclic peptides against 5ooc and (bottom) CDR H3 loop against 5tlk. The seed binders are shown on the right.
  • Figure 4: CDR H3 length (left) and atom count (right) histogram on the de-novo4cni target. Red is the seed H3's reference numbers.
  • Figure 5: Per-residue energy scores at the same position were calculated using Rosetta's residue energy breakdown for a seed and two samples. We analyzed: (a) the seed's serine, (b) 3-hydroxycyclopentyl-alanine (C1O) from sample 11 (\ref{['app:mcp_res']}\ref{['fig:NovelNCAAs']}), (c) tyrosine from sample 35.
  • ...and 8 more figures