Table of Contents
Fetching ...

RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

Hanbum Ko, Chanhui Lee, Ye Rin Kim, Rodrigo Hormazabal, Sehui Han, Sungbin Lim, Sungwoong Kim

Abstract

Retrosynthesis prediction is a core task in organic synthesis that aims to predict reactants for a given product molecule. Traditionally, chemists select a plausible bond disconnection and derive corresponding reactants, which is time-consuming and requires substantial expertise. While recent advancements in molecular large language models (LLMs) have made progress, many methods either predict reactants without strategic reasoning or conduct only a generic product analysis, rather than reason explicitly about bond-disconnection strategies that logically lead to the choice of specific reactants. To overcome these limitations, we propose RetroReasoner, a retrosynthetic reasoning model that leverages chemists' strategic thinking. RetroReasoner is trained using both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we introduce SyntheticRetro, a framework that generates structured disconnection rationales alongside reactant predictions. In the case of RL, we apply a round-trip accuracy as reward, where predicted reactants are passed through a forward synthesis model, and predictions are rewarded when the forward-predicted product matches the original input product. Experimental results show that RetroReasoner not only outperforms prior baselines but also generates a broader range of feasible reactant proposals, particularly in handling more challenging reaction instances.

RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

Abstract

Retrosynthesis prediction is a core task in organic synthesis that aims to predict reactants for a given product molecule. Traditionally, chemists select a plausible bond disconnection and derive corresponding reactants, which is time-consuming and requires substantial expertise. While recent advancements in molecular large language models (LLMs) have made progress, many methods either predict reactants without strategic reasoning or conduct only a generic product analysis, rather than reason explicitly about bond-disconnection strategies that logically lead to the choice of specific reactants. To overcome these limitations, we propose RetroReasoner, a retrosynthetic reasoning model that leverages chemists' strategic thinking. RetroReasoner is trained using both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we introduce SyntheticRetro, a framework that generates structured disconnection rationales alongside reactant predictions. In the case of RL, we apply a round-trip accuracy as reward, where predicted reactants are passed through a forward synthesis model, and predictions are rewarded when the forward-predicted product matches the original input product. Experimental results show that RetroReasoner not only outperforms prior baselines but also generates a broader range of feasible reactant proposals, particularly in handling more challenging reaction instances.
Paper Structure (79 sections, 10 equations, 17 figures, 12 tables)

This paper contains 79 sections, 10 equations, 17 figures, 12 tables.

Figures (17)

  • Figure 1: (Left) Comparison of reasoning processes among Molecular Reasoning LLMs, Molecular Prediction LLMs, and RetroReasoner(Ours). RetroReasoner suggest valid reactant given producy by explicit strategic disconnection steps. (Right) Comparison of molecular reasoning LLMs. The x-axis represents the feasible ratio of reactant proposals, and the y-axis represents the diversity of proposals. Model sizes are distinguished by the size of the circle.
  • Figure 2: A schematic diagram of the generation process of SyntheticRetro, a chemist's strategy-based reasoning data generation process.
  • Figure 3: Learning curve comparing the effect of the number of linking texts used during SFT. In the default setting, 15 linking texts are generated per instance, with a different linking text used in each epoch. This is compared to the case where only one linking text $(n=1)$ is used consistently across all epochs. The accuracy of each structured reasoning step and the Exact@1↑ metric for reactant prediction are shown. The x-axis represents the number of parameter updates.
  • Figure 4: Schematic of the asynchronous data generation system in SyntheticRetro. A producer constructs prompts for linking text generation from ORDerly RXN SMILES and pushes them into a prompt queue, which is then processed asynchronously by multiple vLLM servers. Each vLLM server has a fixed maximum number of concurrent requests, and generation proceeds by filling these slots asynchronously.
  • Figure 5: Example of a reasoning process generated by SyntheticRetro.
  • ...and 12 more figures