Retrosynthetic Planning with Dual Value Networks

Guoqing Liu; Di Xue; Shufang Xie; Yingce Xia; Austin Tripp; Krzysztof Maziarz; Marwin Segler; Tao Qin; Zongzhang Zhang; Tie-Yan Liu

Retrosynthetic Planning with Dual Value Networks

Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu

TL;DR

This work proposes a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase, and builds two separate value networks to predict the synthesizability and cost of molecules, respectively.

Abstract

Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes. Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. In PDVN, we construct two separate value networks to predict the synthesizability and cost of molecules, respectively. To maintain the single-step accuracy, we design a two-branch network structure for the single-step predictor. On the widely-used USPTO dataset, our PDVN algorithm improves the search success rate of existing multi-step planners (e.g., increasing the success rate from 85.79% to 98.95% for Retro*, and reducing the number of model calls by half while solving 99.47% molecules for RetroGraph). Additionally, PDVN helps find shorter synthesis routes (e.g., reducing the average route length from 5.76 to 4.83 for Retro*, and from 5.63 to 4.78 for RetroGraph). Our code is available at \url{https://github.com/DiXue98/PDVN}.

Retrosynthetic Planning with Dual Value Networks

TL;DR

Abstract

Paper Structure (33 sections, 7 equations, 5 figures, 7 tables)

This paper contains 33 sections, 7 equations, 5 figures, 7 tables.

Introduction
Related Work
Single-Step Retrosynthesis.
Multi-Step Retrosynthesis.
Method
Retrosynthesis MDP
Dual Value Networks.
Planning with Dual Value Networks
Selection.
Expansion.
Backup.
Training on Generated Experiences
Policy Network.
Synthesizability Value Network.
Cost Value Network.
...and 18 more sections

Figures (5)

Figure 1: a) The single-step reaction predictor, which predicts potential ways to break a molecule into reactants at each step. b) The multi-step planner, which searches for a complete route by iteratively calling the predictor. The goal of retrosynthesis is to find a synthesis route ending up in the building block molecules for a target molecule.
Figure 2: An illustration of our PDVN algorithm. The PDVN algorithm has three modules: 1) a two-branch policy network; 2) a synthesizability value network that predicts if a molecule can be synthesized; 3) a cost value network that predicts the required synthesis cost if synthesizable. PDVN is initialized with an offline SL model and alternates between two phases: 1) Planning phase: simulate synthesis experiences on the tree-shaped MDP under the guidance of the policy network and dual value networks. 2) Updating phase: extract useful training targets from the generated experiences and update all three networks. Finally, the single-step model trained by PDVN is plugged into popular multi-step planners to enhance their performance.
Figure 3: An illustrative example of the tree-shaped MDP. Starting from the target molecule, chemists recursively choose reactions (denoted by orange rectangles) to break down the product molecules (denoted by blue circles) into reactants, until reaching building block molecules (denoted by green circles) or dead-end molecules (denoted by red circles). In this example, the route is not synthesizable, as there is a red dead-end leaf node $S_{(1,2)}$.
Figure 4: An illustration of the two-branch policy network. The reference single-step model provides a realistic subset of reactions for the input molecule, denoted by Reaction $1$ …, Reaction $k$ . The learnable single-step network optimizes a probability distribution over the selected reactions, i.e., $P_{i}$ is the probability of $\text{Reaction } i$.
Figure 5: Case study of an exemplary route predicted with PDVN. The arrow represents the single-step chemical reaction, and the molecules at the end of the synthesis route are building block molecules.

Retrosynthetic Planning with Dual Value Networks

TL;DR

Abstract

Retrosynthetic Planning with Dual Value Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)