Table of Contents
Fetching ...

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, Quanquan Gu

TL;DR

DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar, and offers a unified framework covering both de novo design and controllable generation.

Abstract

Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes particularly pronounced when the target-ligand pairs used for training do not align with these desired properties. Moreover, most existing methods aim at solving \textit{de novo} design task, while many generative scenarios requiring flexible controllability, such as R-group optimization and scaffold hopping, have received little attention. In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar. Additionally, DecompOpt offers a unified framework covering both \textit{de novo} design and controllable generation. To achieve so, ligands are decomposed into substructures which allows fine-grained control and local optimization. Experiments show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines, and demonstrate great potential in controllable generation tasks.

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

TL;DR

DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar, and offers a unified framework covering both de novo design and controllable generation.

Abstract

Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes particularly pronounced when the target-ligand pairs used for training do not align with these desired properties. Moreover, most existing methods aim at solving \textit{de novo} design task, while many generative scenarios requiring flexible controllability, such as R-group optimization and scaffold hopping, have received little attention. In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar. Additionally, DecompOpt offers a unified framework covering both \textit{de novo} design and controllable generation. To achieve so, ligands are decomposed into substructures which allows fine-grained control and local optimization. Experiments show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines, and demonstrate great potential in controllable generation tasks.
Paper Structure (41 sections, 10 equations, 15 figures, 14 tables, 1 algorithm)

This paper contains 41 sections, 10 equations, 15 figures, 14 tables, 1 algorithm.

Figures (15)

  • Figure 1: Vina Scores distribution of protein-ligand pairs in CrossDocked2020 dataset. $-8.18$ kcal/mol, marked by the red vertical line, is a commonly used value representing moderate binding affinity.
  • Figure 2: Illustration of DecompOpt. In each iteration of optimization: (1) For each subpocket, a reference arm is sampled from the ordered arm list. (2) The controllable and decomposed diffusion model generated ligand molecules based on arm (and subpocket) conditions. (3) The generated ligand molecules are collected and further decomposed into scaffolds and arms. (4) Poor arms in the ordered arm lists are replaced with the new arms that show better properties.
  • Figure 3: Visualization of reference binding molecules (left column), molecules generated by DecompOpt (middle and right column) with 30 rounds of optimization on protein 3DAF (top row) and 4F1M (bottom row). Optimized R-group are highlighted in red.
  • Figure 5: Illustration of training. For this case, there are actually three pairs of arms and subpockets input to the condition encoders separately. For brevity, we only plot one as an example.
  • Figure 6: Illustration of the sampling.
  • ...and 10 more figures