Table of Contents
Fetching ...

GDiffRetro: Retrosynthesis Prediction with Dual Graph Enhanced Molecular Representation and Diffusion Generation

Shengyin Sun, Wenhao Yu, Yuxiang Ren, Weitao Du, Liwei Liu, Xuecang Zhang, Ying Hu, Chen Ma

TL;DR

GDiffRetro tackles retrosynthesis by integrating a dual-graph enhanced molecular representation to improve reaction-center identification and a 3D conditional diffusion model to generate reactants from synthons. The dual-graph component leverages face information by combining representations from a face-centric dual graph with the original molecular graph, improving reaction-center scoring. The reactant-generation stage uses a 3D diffusion process with an equivariant graph neural network, conditioned on synthons to produce chemically plausible, diverse reactants. Empirically, it achieves state-of-the-art top-1 performance among semi-template and template-free methods on USPTO-50k and competitive results versus template-based models, with ablations confirming the effectiveness of both the dual-graph and diffusion components.

Abstract

Retrosynthesis prediction focuses on identifying reactants capable of synthesizing a target product. Typically, the retrosynthesis prediction involves two phases: Reaction Center Identification and Reactant Generation. However, we argue that most existing methods suffer from two limitations in the two phases: (i) Existing models do not adequately capture the ``face'' information in molecular graphs for the reaction center identification. (ii) Current approaches for the reactant generation predominantly use sequence generation in a 2D space, which lacks versatility in generating reasonable distributions for completed reactive groups and overlooks molecules' inherent 3D properties. To overcome the above limitations, we propose GDiffRetro. For the reaction center identification, GDiffRetro uniquely integrates the original graph with its corresponding dual graph to represent molecular structures, which helps guide the model to focus more on the faces in the graph. For the reactant generation, GDiffRetro employs a conditional diffusion model in 3D to further transform the obtained synthon into a complete reactant. Our experimental findings reveal that GDiffRetro outperforms state-of-the-art semi-template models across various evaluative metrics.

GDiffRetro: Retrosynthesis Prediction with Dual Graph Enhanced Molecular Representation and Diffusion Generation

TL;DR

GDiffRetro tackles retrosynthesis by integrating a dual-graph enhanced molecular representation to improve reaction-center identification and a 3D conditional diffusion model to generate reactants from synthons. The dual-graph component leverages face information by combining representations from a face-centric dual graph with the original molecular graph, improving reaction-center scoring. The reactant-generation stage uses a 3D diffusion process with an equivariant graph neural network, conditioned on synthons to produce chemically plausible, diverse reactants. Empirically, it achieves state-of-the-art top-1 performance among semi-template and template-free methods on USPTO-50k and competitive results versus template-based models, with ablations confirming the effectiveness of both the dual-graph and diffusion components.

Abstract

Retrosynthesis prediction focuses on identifying reactants capable of synthesizing a target product. Typically, the retrosynthesis prediction involves two phases: Reaction Center Identification and Reactant Generation. However, we argue that most existing methods suffer from two limitations in the two phases: (i) Existing models do not adequately capture the ``face'' information in molecular graphs for the reaction center identification. (ii) Current approaches for the reactant generation predominantly use sequence generation in a 2D space, which lacks versatility in generating reasonable distributions for completed reactive groups and overlooks molecules' inherent 3D properties. To overcome the above limitations, we propose GDiffRetro. For the reaction center identification, GDiffRetro uniquely integrates the original graph with its corresponding dual graph to represent molecular structures, which helps guide the model to focus more on the faces in the graph. For the reactant generation, GDiffRetro employs a conditional diffusion model in 3D to further transform the obtained synthon into a complete reactant. Our experimental findings reveal that GDiffRetro outperforms state-of-the-art semi-template models across various evaluative metrics.
Paper Structure (22 sections, 21 equations, 9 figures, 3 tables)

This paper contains 22 sections, 21 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The framework of GDiffRetro. In the stage #1, we utilize the dual graph to enhance the representations. In the stage #2, we employ the 3D diffusion model (conditioned on the obtained synthon) to convert synthons into reactants.
  • Figure 2: An example of dual graph construction. Each node in the dual graph corresponds to a face in the original graph, and the type of each edge in the dual graph depends on the type of the edge it crosses in the original graph (Type 1 $\Leftrightarrow$ Type A, Type 2 $\Leftrightarrow$ Type B). More details about dual graphs and faces are in Section \ref{['ssy0724:dual_graph_explain']} (supplementary material).
  • Figure 3: Depiction of the overall Retrosynthesis Prediction process for examples in the protections reaction class. Reaction centers are highlighted on the products and synthons, while the completed parts are outlined with a circle on the reactants.
  • Figure 4: Trajectory of reactant generation. The atoms undergoing changes are highlighted with a rectangle.
  • Figure 5: Top-3 generated reactants for distinct synthons. Correctly completed parts are denoted with Hit!, while incorrect completions are marked with Miss!.
  • ...and 4 more figures