Table of Contents
Fetching ...

Physics-informed generative model for drug-like molecule conformers

David C. Williams, Neil Inala

Abstract

We present a diffusion-based, generative model for conformer generation. Our model is focused on the reproduction of bonded structure and is constructed from the associated terms traditionally found in classical force fields to ensure a physically relevant representation. Techniques in deep learning are used to infer atom typing and geometric parameters from a training set. Conformer sampling is achieved by taking advantage of recent advancements in diffusion-based generation. By training on large, synthetic data sets of diverse, drug-like molecules optimized with the semiempirical GFN2-xTB method, high accuracy is achieved for bonded parameters, exceeding that of conventional, knowledge-based methods. Results are also compared to experimental structures from the Protein Databank (PDB) and Cambridge Structural Database (CSD).

Physics-informed generative model for drug-like molecule conformers

Abstract

We present a diffusion-based, generative model for conformer generation. Our model is focused on the reproduction of bonded structure and is constructed from the associated terms traditionally found in classical force fields to ensure a physically relevant representation. Techniques in deep learning are used to infer atom typing and geometric parameters from a training set. Conformer sampling is achieved by taking advantage of recent advancements in diffusion-based generation. By training on large, synthetic data sets of diverse, drug-like molecules optimized with the semiempirical GFN2-xTB method, high accuracy is achieved for bonded parameters, exceeding that of conventional, knowledge-based methods. Results are also compared to experimental structures from the Protein Databank (PDB) and Cambridge Structural Database (CSD).
Paper Structure (22 sections, 24 equations, 45 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 24 equations, 45 figures, 6 tables, 1 algorithm.

Figures (45)

  • Figure 1: Force fields typically include bonded terms associated with (a) bond lengths, (b) bond angles, (c) proper torsions, and (d) improper torsions. Each term has an associated subgraph topology and a single characteristic property.
  • Figure 2: A schematic of the denoising model $D$.
  • Figure 3: Characterizing bonded terms using atom distances $|\bm\delta|$. Shown are (a) bonds, (b) bends, and (c) proper torsions.
  • Figure 4: The sign of chirality in terms of an improper torsion angle. The sign depends on whether a neighboring atom is above or below the plane formed by the other three atoms. The sign changes if any two neighboring atoms are swapped.
  • Figure 5: The distribution of molecular weight (left) and logP (right) for two datasets. The logP value is estimated using the Crippen algorithm wildman_prediction_1999.
  • ...and 40 more figures