Table of Contents
Fetching ...

It Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design

Jeff Guo, Philippe Schwaller

Abstract

Constrained synthesizability is an unaddressed challenge in generative molecular design. In particular, designing molecules satisfying multi-parameter optimization objectives, while simultaneously being synthesizable and enforcing the presence of specific commercial building blocks in the synthesis. This is practically important for molecule re-purposing, sustainability, and efficiency. In this work, we propose a novel reward function called TANimoto Group Overlap (TANGO), which uses chemistry principles to transform a sparse reward function into a dense and learnable reward function -- crucial for reinforcement learning. TANGO can augment general-purpose molecular generative models to directly optimize for constrained synthesizability while simultaneously optimizing for other properties relevant to drug discovery using reinforcement learning. Our framework is general and addresses starting-material, intermediate, and divergent synthesis constraints. Contrary to most existing works in the field, we show that incentivizing a general-purpose (without any inductive biases) model is a productive approach to navigating challenging optimization scenarios. We demonstrate this by showing that the trained models explicitly learn a desirable distribution. Our framework is the first generative approach to tackle constrained synthesizability.

It Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design

Abstract

Constrained synthesizability is an unaddressed challenge in generative molecular design. In particular, designing molecules satisfying multi-parameter optimization objectives, while simultaneously being synthesizable and enforcing the presence of specific commercial building blocks in the synthesis. This is practically important for molecule re-purposing, sustainability, and efficiency. In this work, we propose a novel reward function called TANimoto Group Overlap (TANGO), which uses chemistry principles to transform a sparse reward function into a dense and learnable reward function -- crucial for reinforcement learning. TANGO can augment general-purpose molecular generative models to directly optimize for constrained synthesizability while simultaneously optimizing for other properties relevant to drug discovery using reinforcement learning. Our framework is general and addresses starting-material, intermediate, and divergent synthesis constraints. Contrary to most existing works in the field, we show that incentivizing a general-purpose (without any inductive biases) model is a productive approach to navigating challenging optimization scenarios. We demonstrate this by showing that the trained models explicitly learn a desirable distribution. Our framework is the first generative approach to tackle constrained synthesizability.

Paper Structure

This paper contains 24 sections, 4 equations, 13 figures, 10 tables, 1 algorithm.

Figures (13)

  • Figure 1: TANGO guides the generation of molecules directly optimized for constrained synthesizability with enforced building blocks while simultaneously optimizing other properties. Our method generalizes across starting-material, intermediate, and divergent synthesis constraints.
  • Figure 2: TANGO reward function: the maximum similarity between every non-root node (generated molecule) molecule and the set of enforced building blocks. Every synthesizable generated molecule returns a non-zero reward.
  • Figure 3: Example generated molecules under the starting-material and divergent synthesis (one-step synthesis from a non-commercial common intermediate to diverse, high-reward molecules) constraints. The docking scores and QED values are annotated. For the divergent synthesis graph, the $\Delta$ docking score (negative is better) and QED (positive is better) are additionally annotated.
  • Figure 4: The model learns a distribution of molecules that satisfy the MPO objective. The final model checkpoint from the 100 enforced building blocks experiment (all 10 seeds) was used to sample 1,000 unique molecules. a. Counts of solvable molecules from the checkpoints with the mean and standard deviation reported (non-bolded). Only 1 out of 10 final model checkpoints was unable to yield "Solved (enforced)" molecules. The pre-trained model (before RL) generates mostly unsynthesizable molecules and no synthesizable molecules with enforced blocks (metrics are bolded). b. Docking Score (DS) and QED values of the pooled Solved (Enforced) molecules across all seeds. c, d, e uses 1,000 unique molecules sampled from one final model checkpoint.c. UMAP of sampled molecules compared to the pre-trained model. d. Negative log-likelihoods (NLLs) of the sampled molecules. It is much more likely to generate the sampled molecules under the final model checkpoint. e. Top-10 (by docking score) molecules with the enforced building block highlighted. The NLLs are similar.
  • Figure D5: Reward shaping function for docking.
  • ...and 8 more figures