Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu
TL;DR
This work addresses the challenge of generating feasible and criterion-aligned chemical synthesis routes by integrating a conditional residual energy-based model (CREBM) with existing retrosynthesis strategies. It frames route generation as a probabilistic combination $P_\theta(\mathcal{T}|m_{tar},c) \propto P_{Retro}(\mathcal{T}|m_{tar}) \exp(-E_\theta(\mathcal{T}|m_{tar},c))$, allowing a learnable energy function to steer toward routes that satisfy criteria such as feasibility and cost. The authors adopt a preference-based training regime inspired by reward modeling in LLMs, using a Bradley–Terry loss over route comparisons with a heuristic feasibility reward $\varphi$, and implement with a Transformer-based $E_\theta$. In experiments on RetroBench, CREBM consistently improves top-1 accuracy across diverse base strategies, with more pronounced gains for deeper routes, demonstrating the framework’s plug-and-play effectiveness and potential for controllable synthesis planning. The work highlights a practical path to integrating long-range criteria into molecule synthesis workflows without retraining base retrosynthesis models.
Abstract
Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at https://github.com/SongtaoLiu0823/CREBM.
