GDiffRetro: Retrosynthesis Prediction with Dual Graph Enhanced Molecular Representation and Diffusion Generation
Shengyin Sun, Wenhao Yu, Yuxiang Ren, Weitao Du, Liwei Liu, Xuecang Zhang, Ying Hu, Chen Ma
TL;DR
GDiffRetro tackles retrosynthesis by integrating a dual-graph enhanced molecular representation to improve reaction-center identification and a 3D conditional diffusion model to generate reactants from synthons. The dual-graph component leverages face information by combining representations from a face-centric dual graph with the original molecular graph, improving reaction-center scoring. The reactant-generation stage uses a 3D diffusion process with an equivariant graph neural network, conditioned on synthons to produce chemically plausible, diverse reactants. Empirically, it achieves state-of-the-art top-1 performance among semi-template and template-free methods on USPTO-50k and competitive results versus template-based models, with ablations confirming the effectiveness of both the dual-graph and diffusion components.
Abstract
Retrosynthesis prediction focuses on identifying reactants capable of synthesizing a target product. Typically, the retrosynthesis prediction involves two phases: Reaction Center Identification and Reactant Generation. However, we argue that most existing methods suffer from two limitations in the two phases: (i) Existing models do not adequately capture the ``face'' information in molecular graphs for the reaction center identification. (ii) Current approaches for the reactant generation predominantly use sequence generation in a 2D space, which lacks versatility in generating reasonable distributions for completed reactive groups and overlooks molecules' inherent 3D properties. To overcome the above limitations, we propose GDiffRetro. For the reaction center identification, GDiffRetro uniquely integrates the original graph with its corresponding dual graph to represent molecular structures, which helps guide the model to focus more on the faces in the graph. For the reactant generation, GDiffRetro employs a conditional diffusion model in 3D to further transform the obtained synthon into a complete reactant. Our experimental findings reveal that GDiffRetro outperforms state-of-the-art semi-template models across various evaluative metrics.
