Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

Yuwei Yang; Siqi Ouyang; Xueyu Hu; Mingyue Zheng; Hao Zhou; Lei Li

Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

Yuwei Yang, Siqi Ouyang, Xueyu Hu, Mingyue Zheng, Hao Zhou, Lei Li

TL;DR

MolEdit3D tackles the challenge of structure-based drug design by aligning the action space with the objective: editing in 3D space via a fragment-based graph editor. It combines a generative pre-training stage on abundant 3D ligands with a target-guided Bayesian sampling procedure that optimizes a composite objective including binding affinity and drug-likeness, refined by self-learning from generated data. The approach yields state-of-the-art results on key metrics such as validity, success rate, binding affinity, and synthetic accessibility across multiple targets, while maintaining realistic 2D ring systems and conformational torsions. This method promises practical gains in designing novel, synthesizable ligands with strong target affinity by exploiting 3D editing and continual self-improvement.

Abstract

Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves optimization-based approaches more promising in practical scenario. However, existing optimization-based approaches choose to edit molecules in 2D space, and use molecular docking to estimate the activity using docking predicted 3D target-ligand complexes. The misalignment between the action space and the objective hinders the performance of these models, especially for those employ deep learning for acceleration. In this work, we propose MolEdit3D to combine 3D molecular generation with optimization frameworks. We develop a novel 3D graph editing model to generate molecules using fragments, and pre-train this model on abundant 3D ligands for learning target-independent properties. Then we employ a target-guided self-learning strategy to improve target-related properties using self-sampled molecules. MolEdit3D achieves state-of-the-art performance on majority of the evaluation metrics, and demonstrate strong capability of capturing both target-dependent and -independent properties.

Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

TL;DR

Abstract

Paper Structure (29 sections, 11 equations, 5 figures, 3 tables)

This paper contains 29 sections, 11 equations, 5 figures, 3 tables.

Introduction
Related Work
The Proposed Method
3D Graph Editing Model
Parameterization of Editing Operations
Parameterization of Hierarchical Graphs
Generative Pre-training for 3D Graph Editing Model
Target-Guided Bayesian Sampling with Self-Learning
Results and Discussions
Experiments
Model Details
Evaluation
Results and Analysis
Main Result
Additional Target-Independent Properties
...and 14 more sections

Figures (5)

Figure 1: Model Overview. MolEdit3D contains three components. 3D graph editing model predicts the geometric edits which either add or delete a rigid fragment from the skeleton molecule. For add operation, the skeleton molecule is linked with a rigid fragment using the predicted attaching sites and torsion angle (defined by four consecutive atoms, $w$, $v$, $v'$ and $w'$). For delete operation, the predicted bond is broken. With the editing model, we use generative pre-training to reconstruct 3D ligands for learning target-independent properties. The model is further finetuned using target-guided self-learning strategy, which use self-generated molecules with improved target-related properties to enhance target-awareness.
Figure 2: Target binding pose of MolEdit3D generated molecules for 5MKU and 3VRJ proteins. The generated molecules demonstrate high Vina score, QED and SAscore.
Figure 3: Angular distribution comparison for CCCC (upper panel) and Cccc (lower panel) torsion angles between CrossDocked2020 reference molecules and model generated molecules. DESERT and MolEdit3D show better overlap with reference distribution.
Figure 4: Frequency of different ring sizes for DrugCentral reference molecules and model generated molecules. MolEdit3D has the best overlap with the reference frequency.
Figure 5: Mining 3D Rigid Molecular Fragments. 1) Non-terminal single bonds are broken to create 2D fragments and broken bonds are labeled as editable sites; 2) 3D conformations are generated for each fragment.

Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

TL;DR

Abstract

Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)