Table of Contents
Fetching ...

Guided Multi-objective Generative AI to Enhance Structure-based Drug Design

Amit Kadan, Kevin Ryczko, Erika Lloyd, Adrian Roitberg, Takeshi Yamazaki

TL;DR

IDOLpro addresses the inverse-design challenge in structure-based drug design by marrying diffusion-based molecule generation with differentiable, multi-objective guidance to co-optimize binding affinity and synthetic accessibility. It optimizes latent representations at a defined diffusion horizon $t_{hz}$ using gradients from differentiable scoring functions, followed by structural refinement with gradient-based methods and ANI2x, enabling rapid, in-pocket ligand design. On CrossDocked and Binding MOAD benchmarks, IDOLpro delivers state-of-the-art Vina improvements (e.g., $\approx 0.7$–$1.4$ kcal/mol) and higher QED, while outperforming exhaustive virtual screening in time and cost by orders of magnitude, and enabling lead optimization from scaffolded references. The framework is modular and extensible, capable of incorporating additional scores (e.g., ADME-Tox) to accelerate hit-finding and lead optimization in drug discovery pipelines. The approach holds potential for faster, more reliable generation of drug-like ligands directly in protein pockets, reducing search space and enabling multi-property optimization in silico.

Abstract

Generative AI has the potential to revolutionize drug discovery. Yet, despite recent advances in deep learning, existing models cannot generate molecules that satisfy all desired physicochemical properties. Herein, we describe IDOLpro, a generative chemistry AI combining diffusion with multi-objective optimization for structure-based drug design. Differentiable scoring functions guide the latent variables of the diffusion model to explore uncharted chemical space and generate novel ligands in silico, optimizing a plurality of target physicochemical properties. We demonstrate our platform's effectiveness by generating ligands with optimized binding affinity and synthetic accessibility on two benchmark sets. IDOLpro produces ligands with binding affinities over 10%-20% better than the next best state-of-the-art method on each test set, producing more drug-like molecules with generally better synthetic accessibility scores than other methods. We do a head-to-head comparison of IDOLpro against a classic virtual screen of a large database of drug-like molecules. We show that IDOLpro can generate molecules for a range of important disease-related targets with better binding affinity and synthetic accessibility than any molecule found in the virtual screen while being over 100x faster and less expensive to run. On a test set of experimental complexes, IDOLpro is the first to produce molecules with better binding affinities than experimentally observed ligands. IDOLpro can accommodate other scoring functions (e.g. ADME-Tox) to accelerate hit-finding, hit-to-lead, and lead optimization for drug discovery.

Guided Multi-objective Generative AI to Enhance Structure-based Drug Design

TL;DR

IDOLpro addresses the inverse-design challenge in structure-based drug design by marrying diffusion-based molecule generation with differentiable, multi-objective guidance to co-optimize binding affinity and synthetic accessibility. It optimizes latent representations at a defined diffusion horizon using gradients from differentiable scoring functions, followed by structural refinement with gradient-based methods and ANI2x, enabling rapid, in-pocket ligand design. On CrossDocked and Binding MOAD benchmarks, IDOLpro delivers state-of-the-art Vina improvements (e.g., kcal/mol) and higher QED, while outperforming exhaustive virtual screening in time and cost by orders of magnitude, and enabling lead optimization from scaffolded references. The framework is modular and extensible, capable of incorporating additional scores (e.g., ADME-Tox) to accelerate hit-finding and lead optimization in drug discovery pipelines. The approach holds potential for faster, more reliable generation of drug-like ligands directly in protein pockets, reducing search space and enabling multi-property optimization in silico.

Abstract

Generative AI has the potential to revolutionize drug discovery. Yet, despite recent advances in deep learning, existing models cannot generate molecules that satisfy all desired physicochemical properties. Herein, we describe IDOLpro, a generative chemistry AI combining diffusion with multi-objective optimization for structure-based drug design. Differentiable scoring functions guide the latent variables of the diffusion model to explore uncharted chemical space and generate novel ligands in silico, optimizing a plurality of target physicochemical properties. We demonstrate our platform's effectiveness by generating ligands with optimized binding affinity and synthetic accessibility on two benchmark sets. IDOLpro produces ligands with binding affinities over 10%-20% better than the next best state-of-the-art method on each test set, producing more drug-like molecules with generally better synthetic accessibility scores than other methods. We do a head-to-head comparison of IDOLpro against a classic virtual screen of a large database of drug-like molecules. We show that IDOLpro can generate molecules for a range of important disease-related targets with better binding affinity and synthetic accessibility than any molecule found in the virtual screen while being over 100x faster and less expensive to run. On a test set of experimental complexes, IDOLpro is the first to produce molecules with better binding affinities than experimentally observed ligands. IDOLpro can accommodate other scoring functions (e.g. ADME-Tox) to accelerate hit-finding, hit-to-lead, and lead optimization for drug discovery.
Paper Structure (33 sections, 3 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 3 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Visual overview of IDOLpro. A random latent vector, $\mathbf{z}_T^{\ell}$, is sampled for each ligand in the batch conditioned on the pocket coordinates, $\mathbf{z}^p$. Reverse diffusion is run from time $T$ to the optimization horizon, time $t_{hz}$. The rest of the diffusion process is completed, and the ligands are scored by evaluating a set of differentiable scores defining target physicochemical properties. The gradient with respect to each latent vector at the optimization horizon, $\partial S_i / \partial \mathbf{z}_{t_{hz}}^{\ell}$, is used to take a gradient step in the latent space. This process is iterated until a maximum number of steps have been reached, or a valid ligand cannot be generated with the current latent vector.
  • Figure 2: Performance of DL tools on two benchmark test sets. The scatter plot shows the average Vina and SA score for each method for targets in CrossDocked (left), and the Binding MOAD (right). IDOLpro is at the bottom left of each scatter plot, showing it can co-optimize Vina and SA for generated ligands. $\dagger$ Vina scores of reference ligands in the test set, redocked with QuickVina2. * Baseline method for IDOLpro.
  • Figure 3: Molecules produced by IDOLpro when optimizing torchvina and torchSA. One example from each test set is shown -- protein 4aua from CrossDocked, and protein 3cjo from the Binding MOAD. Left column: reference molecules from the test sets. Middle column: initial ligand produced by DiffSBDD prior to latent vector optimization. Right column: molecule produced by IDOLpro after optimizing torchvina and torchSA. For each example, the molecule produced by DiffSBDD has worse Vina and SA scores than the reference molecule. After optimization with IDOLpro, both the Vina and SA scores of the generated molecule are better than the reference. Visualizations were created with PyMol PyMOL, and interactions were visualized with the protein-ligand interaction profiler (PLIP) adasme2021plip.
  • Figure 4: Comparison of IDOLpro to virtually screening ZINC250K. Top center: The average top-1 and top-10 Vina scores of IDOLpro generated molecules compared to screened molecules from ZINC250K across all 10 targets. Bottom left: The distribution of Vina scores for generated and screened ligands for EGFR, an important oncology target (PDB ID 2rgp). Bottom right: The distribution of Vina scores for generated and screened ligands for the SARS-Cov-2 main protease (PDB ID 7l11).
  • Figure 5: Molecules produced by IDOLpro during lead optimization. Examples shown are on the ligand 1ly docked into protein 1a2g, and ligand plp docked into protein 2jjg, both examples from the CrossDocked test set. IDOLpro is used to append molecules to the scaffold while optimizing torchvina and torchSA. IDOLpro yields multiple ligands with improved SA and Vina relative to the reference molecule.
  • ...and 4 more figures