Table of Contents
Fetching ...

Reinforced Reasoning for End-to-End Retrosynthetic Planning

Chenyang Zuo, Siqi Fan, Yizhen Luo, Zaiqing Nie

Abstract

Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route utility. Empirical evaluation on RetroBench demonstrates that ReTriP achieves state-of-the-art performance, exhibiting superior robustness in long-horizon planning compared to hybrid baselines.

Reinforced Reasoning for End-to-End Retrosynthetic Planning

Abstract

Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route utility. Empirical evaluation on RetroBench demonstrates that ReTriP achieves state-of-the-art performance, exhibiting superior robustness in long-horizon planning compared to hybrid baselines.

Paper Structure

This paper contains 36 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of retrosynthetic paradigms. (a) Decoupled search-based method; (b) Our unified ReTriP framework.
  • Figure 2: The ReTriP framework. (a) Path-coherent data construction: (a1) The target molecule undergoes notational augmentation. (a2) Iterative alignment ensures fragment traceability. (b) Progressive training curriculum: (b1) A three-stage SFT process transitions from trajectory modeling to reasoning distillation and loss-rebalancing calibration. (b2) RLVR aligns stepwise logic with precursor availability and synthetic efficiency. (c) Inference and scaling: A ranking-guided TTA selects optimal input notations, followed by a consensus-based voting mechanism.
  • Figure 3: Top-1 accuracy across different synthetic route depths.
  • Figure 4: Top-k accuracy of the final model across different TTA sizes.
  • Figure 5: Qualitative case study. A generated 9-step retrosynthetic plan, featuring integrated CoT reasoning for critical steps.
  • ...and 2 more figures