Table of Contents
Fetching ...

ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design

Keir Adams, Kento Abeywardane, Jenna Fromer, Connor W. Coley

TL;DR

ShEPhERD introduces a SE(3)-equivariant diffusion model that jointly learns the distribution of 3D molecular structures and their interaction profiles (shape, ESP, and directional pharmacophores). By defining explicit 3D representations and tailored similarity scores, it enables unconditional generation and interaction-conditioned inpainting to produce novel molecules with targeted 3D interactions. The framework demonstrates capabilities in natural-product ligand hopping, bioactive hit diversification, and bioisosteric fragment merging, achieving high self-consistency between generated molecules and their interaction profiles while outperforming certain baselines in shape-oriented tasks. This interaction-aware approach offers a versatile platform for ligand-based drug design and can be extended to other interaction-driven molecular design domains such as structure-based design and organocatalyst development.

Abstract

Engineering molecules to exhibit precise 3D intermolecular interactions with their environment forms the basis of chemical design. In ligand-based drug design, bioisosteric analogues of known bioactive hits are often identified by virtually screening chemical libraries with shape, electrostatic, and pharmacophore similarity scoring functions. We instead hypothesize that a generative model which learns the joint distribution over 3D molecular structures and their interaction profiles may facilitate 3D interaction-aware chemical design. We specifically design ShEPhERD, an SE(3)-equivariant diffusion model which jointly diffuses/denoises 3D molecular graphs and representations of their shapes, electrostatic potential surfaces, and (directional) pharmacophores to/from Gaussian noise. Inspired by traditional ligand discovery, we compose 3D similarity scoring functions to assess ShEPhERD's ability to conditionally generate novel molecules with desired interaction profiles. We demonstrate ShEPhERD's potential for impact via exemplary drug design tasks including natural product ligand hopping, protein-blind bioactive hit diversification, and bioisosteric fragment merging.

ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design

TL;DR

ShEPhERD introduces a SE(3)-equivariant diffusion model that jointly learns the distribution of 3D molecular structures and their interaction profiles (shape, ESP, and directional pharmacophores). By defining explicit 3D representations and tailored similarity scores, it enables unconditional generation and interaction-conditioned inpainting to produce novel molecules with targeted 3D interactions. The framework demonstrates capabilities in natural-product ligand hopping, bioactive hit diversification, and bioisosteric fragment merging, achieving high self-consistency between generated molecules and their interaction profiles while outperforming certain baselines in shape-oriented tasks. This interaction-aware approach offers a versatile platform for ligand-based drug design and can be extended to other interaction-driven molecular design domains such as structure-based design and organocatalyst development.

Abstract

Engineering molecules to exhibit precise 3D intermolecular interactions with their environment forms the basis of chemical design. In ligand-based drug design, bioisosteric analogues of known bioactive hits are often identified by virtually screening chemical libraries with shape, electrostatic, and pharmacophore similarity scoring functions. We instead hypothesize that a generative model which learns the joint distribution over 3D molecular structures and their interaction profiles may facilitate 3D interaction-aware chemical design. We specifically design ShEPhERD, an SE(3)-equivariant diffusion model which jointly diffuses/denoises 3D molecular graphs and representations of their shapes, electrostatic potential surfaces, and (directional) pharmacophores to/from Gaussian noise. Inspired by traditional ligand discovery, we compose 3D similarity scoring functions to assess ShEPhERD's ability to conditionally generate novel molecules with desired interaction profiles. We demonstrate ShEPhERD's potential for impact via exemplary drug design tasks including natural product ligand hopping, protein-blind bioactive hit diversification, and bioisosteric fragment merging.

Paper Structure

This paper contains 40 sections, 10 equations, 34 figures, 11 tables, 5 algorithms.

Figures (34)

  • Figure 1: We introduce ShEPhERD, a diffusion model that jointly generates 3D molecules and their shapes, electrostatics, and pharmacophores. By explicitly modeling 3D molecular interactions, ShEPhERD can be applied across myriad challenging ligand-based drug design tasks including natural product ligand hopping, bioactive hit diversification, and bioisosteric fragment merging.
  • Figure 2: (Top) Visualization of our 3D point-cloud similarity scoring functions, which evaluate surface/shape, electrostatic, and pharmacophore similarity via weighted Gaussian overlaps. (Middle)ShEPhERD's denoising network architecture, which uses SE(3)-equivariant neural networks to (1) embed noisy input 3D molecules and their interaction profiles into scalar and vector node features, (2) jointly interact the node features, and (3) denoise the input states. (Bottom)ShEPhERD uses inpainting to sample chemically diverse 3D molecules that exhibit desired interaction profiles.
  • Figure 3: (Left) Self-consistency of jointly generated 3D molecules and their shapes, ESP surfaces, or pharmacophores, as assessed via 3D similarity between the generated vs. true interaction profiles of the generated molecules. Shape and ESP consistency are bounded due to randomness in surface sampling. (Right) Distributions of 3D interaction similarities between ShEPhERD-generated or dataset-sampled 3D molecules (post-relaxation and realignment) and 100 target molecules from ShEPhERD-GDB17, including the top-1 scores given 20 samples per target. ShEPhERD generates 3D molecules with low graph similarity to the target molecule, and which have stable geometries as measured by heavy-atom RMSD upon xTB-relaxation. Also shown are top-scoring samples overlaid on their target profiles (surfaces are upsampled for visualization), labeled with 3D similarity scores.
  • Figure 4: (Left) Examples of ShEPhERD-generated analogues of natural product targets, labeled by SA score, ESP similarity, and pharmacophore similarity to the target. Similarities are computed after xTB-relaxation and ESP-optimal realignment. (Right) Distributions of Vina scores for ${\leq}$500 samples from ShEPhERD when conditioning on the bound or lowest-energy pose of co-crystal PDB ligands across 7 proteins. We compare against the Vina scores of 10K virtually screened molecules from ShEPhERD-MOSES-aq. For 5mo4 and 7l11, we show top-scoring ShEPhERD-generated ligands (conditioned on low-energy poses), and overlay a selection from their top-10 docked poses on the PDB ligands. (Bottom)ShEPhERD's bioisosteric fragment merging workflow. We extract the ESP surface and pharmacophores of 13 fragments from a fragment screen, and show ShEPhERD-generated ligands with low SA score and high 3D similarities to the fragments' interaction profiles.
  • Figure 5: Distributions of 3D ESP and pharmacophore similarity to each of three natural product targets for small-molecules sampled via ShEPhERD, REINVENT, or virtual screening (VS). Similarity scores are computed after optimizing the molecular geometry with xTB and aligning the structure to the natural product by maximizing ESP similarity. For ShEPhERD and REINVENT, only valid samples with SA score $<4.5$ are included. Also visualized are examples of top-scoring samples generated by ShEPhERD and REINVENT.
  • ...and 29 more figures