Table of Contents
Fetching ...

Retrosynthesis Prediction via Search in (Hyper) Graph

Zixun Lan, Binjie Hong, Jiajun Zhu, Zuo Zeng, Zhenfu Liu, Limin Yu, Fei Ma

TL;DR

This work addresses retrosynthesis prediction by proposing RetroSiG, a semi-template-based framework that reframes reaction center identification as a search in the product molecular graph and leaving-group completion as a search in a leaving-group hypergraph. By leveraging an Edge Graph Attention Network for the product graph and a Hyper Graph Neural Network for the hypergraph, RetroSiG captures complex reactions including multiple reaction centers and repeated leaving groups, while the one-hop constraint reduces search space. The approach achieves competitive Top-k exact-match accuracy on USPTO-50K, with ablations demonstrating the effectiveness of the hypergraph representation and the one-hop constraint, and experiments highlighting capabilities in predicting complex reactions. Overall, RetroSiG advances interpretable, scalable retrosynthesis by combining structured chemical priors with reinforcement-learning-based search to extend beyond the limitations of existing template-based and template-free methods.

Abstract

Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multiple reaction center or attaching the same leaving group to more than one atom. In this study we propose a semi-template-based method, the \textbf{Retro}synthesis via \textbf{S}earch \textbf{i}n (Hyper) \textbf{G}raph (RetroSiG) framework to alleviate these limitations. In the proposed method, we turn the reaction center identification and the leaving group completion tasks as tasks of searching in the product molecular graph and leaving group hypergraph respectively. As a semi-template-based method RetroSiG has several advantages. First, RetroSiG is able to handle the complex reactions mentioned above by its novel search mechanism. Second, RetroSiG naturally exploits the hypergraph to model the implicit dependencies between leaving groups. Third, RetroSiG makes full use of the prior, i.e., one-hop constraint. It reduces the search space and enhances overall performance. Comprehensive experiments demonstrated that RetroSiG achieved competitive results. Furthermore, we conducted experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments verified the efficacy of specific elements, such as the one-hop constraint and the leaving group hypergraph.

Retrosynthesis Prediction via Search in (Hyper) Graph

TL;DR

This work addresses retrosynthesis prediction by proposing RetroSiG, a semi-template-based framework that reframes reaction center identification as a search in the product molecular graph and leaving-group completion as a search in a leaving-group hypergraph. By leveraging an Edge Graph Attention Network for the product graph and a Hyper Graph Neural Network for the hypergraph, RetroSiG captures complex reactions including multiple reaction centers and repeated leaving groups, while the one-hop constraint reduces search space. The approach achieves competitive Top-k exact-match accuracy on USPTO-50K, with ablations demonstrating the effectiveness of the hypergraph representation and the one-hop constraint, and experiments highlighting capabilities in predicting complex reactions. Overall, RetroSiG advances interpretable, scalable retrosynthesis by combining structured chemical priors with reinforcement-learning-based search to extend beyond the limitations of existing template-based and template-free methods.

Abstract

Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multiple reaction center or attaching the same leaving group to more than one atom. In this study we propose a semi-template-based method, the \textbf{Retro}synthesis via \textbf{S}earch \textbf{i}n (Hyper) \textbf{G}raph (RetroSiG) framework to alleviate these limitations. In the proposed method, we turn the reaction center identification and the leaving group completion tasks as tasks of searching in the product molecular graph and leaving group hypergraph respectively. As a semi-template-based method RetroSiG has several advantages. First, RetroSiG is able to handle the complex reactions mentioned above by its novel search mechanism. Second, RetroSiG naturally exploits the hypergraph to model the implicit dependencies between leaving groups. Third, RetroSiG makes full use of the prior, i.e., one-hop constraint. It reduces the search space and enhances overall performance. Comprehensive experiments demonstrated that RetroSiG achieved competitive results. Furthermore, we conducted experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments verified the efficacy of specific elements, such as the one-hop constraint and the leaving group hypergraph.
Paper Structure (21 sections, 9 equations, 4 figures, 5 tables)

This paper contains 21 sections, 9 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Architecture overview. RretroSiG first identifies reaction center via (a) search in the product molecular graph and then completes leaving groups through (b) search in the leaving group hypergraph. Finally, RetroSiG converts predicted subgraph into reaction center SMARTS and leaving groups SMARTS, subsequently merging them to derive one retrosynthesis template. RetroSiG obtains the reactants by applying the merged template to the given product. At state $t$, the red highlighted part represents the explored nodes, and the green nodes denote the action space after applying the one-hop constraint.
  • Figure 2: a. Policy network architecture in the search in the product molecular graph. b. Policy network architecture in the search in the leaving group hypergraph.
  • Figure 3: Analysis of complex samples.
  • Figure 4: Visualization of Example Predictions.