Table of Contents
Fetching ...

Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs

Nguyen Xuan-Vu, Daniel Armstrong, Milena Wehrbach, Andres M Bran, Zlatko Jončev, Philippe Schwaller

TL;DR

CASP faces challenges in integrating chemist feedback and assessing route feasibility. Synthelite answers with a two-phase framework where LLMs draft retrosynthetic blueprints (Phase 1) and a similarity-based Monte Carlo Tree Search realizes routes aligned with stock constraints (Phase 2). It demonstrates high steerability to expert prompts, effective handling of starting-material constraints, and feasibility-aware route design, achieving competitive solve rates on USPTO benchmarks. The work highlights a practical path toward LLM-centric orchestration of synthesis planning while acknowledging current limitations of closed LLMs and template-matching gaps. Overall, Synthelite represents a significant step toward interactive, chemistry-grounded, LLM-guided CASP tooling with real-world potential.

Abstract

Computer-aided synthesis planning (CASP) has long been envisioned as a complementary tool for synthetic chemists. However, existing frameworks often lack mechanisms to allow interaction with human experts, limiting their ability to integrate chemists' insights. In this work, we introduce Synthelite, a synthesis planning framework that uses large language models (LLMs) to directly propose retrosynthetic transformations. Synthelite can generate end-to-end synthesis routes by harnessing the intrinsic chemical knowledge and reasoning capabilities of LLMs, while allowing expert intervention through natural language prompts. Our experiments demonstrate that Synthelite can flexibly adapt its planning trajectory to diverse user-specified constraints, achieving up to 95\% success rates in both strategy-constrained and starting-material-constrained synthesis tasks. Additionally, Synthelite exhibits the ability to account for chemical feasibility during route design. We envision Synthelite to be both a useful tool and a step toward a paradigm where LLMs are the central orchestrators of synthesis planning.

Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs

TL;DR

CASP faces challenges in integrating chemist feedback and assessing route feasibility. Synthelite answers with a two-phase framework where LLMs draft retrosynthetic blueprints (Phase 1) and a similarity-based Monte Carlo Tree Search realizes routes aligned with stock constraints (Phase 2). It demonstrates high steerability to expert prompts, effective handling of starting-material constraints, and feasibility-aware route design, achieving competitive solve rates on USPTO benchmarks. The work highlights a practical path toward LLM-centric orchestration of synthesis planning while acknowledging current limitations of closed LLMs and template-matching gaps. Overall, Synthelite represents a significant step toward interactive, chemistry-grounded, LLM-guided CASP tooling with real-world potential.

Abstract

Computer-aided synthesis planning (CASP) has long been envisioned as a complementary tool for synthetic chemists. However, existing frameworks often lack mechanisms to allow interaction with human experts, limiting their ability to integrate chemists' insights. In this work, we introduce Synthelite, a synthesis planning framework that uses large language models (LLMs) to directly propose retrosynthetic transformations. Synthelite can generate end-to-end synthesis routes by harnessing the intrinsic chemical knowledge and reasoning capabilities of LLMs, while allowing expert intervention through natural language prompts. Our experiments demonstrate that Synthelite can flexibly adapt its planning trajectory to diverse user-specified constraints, achieving up to 95\% success rates in both strategy-constrained and starting-material-constrained synthesis tasks. Additionally, Synthelite exhibits the ability to account for chemical feasibility during route design. We envision Synthelite to be both a useful tool and a step toward a paradigm where LLMs are the central orchestrators of synthesis planning.

Paper Structure

This paper contains 25 sections, 1 equation, 22 figures, 2 tables.

Figures (22)

  • Figure 1: a) An example of the task of expert-prompted synthesis planning. b) Conceptual overview of Synthelite, consisting of two phases: In Phase 1, the LLM drafts a strategy for each reaction step; in Phase 2, searching for a combination of reactions that can lead the synthesis to in-stock materials while aligned with the LLM-proposed strategy in Phase 1. c) A close-up on Phase 1. For each step of the synthesis, the LLM produces a textual description of the next retro step, which is then used to retrieve relevant reactions from an LLM-annotated template database. The LLM then chooses the most suitable reaction to update the synthesis.
  • Figure 2: a) Metric definition: A pair of target and prompt is considered to be solved successfully if the model can find at least 1 route that passes the validation scripts. b) Precision and recall of Synthelite, varying the LLM, compared with the AZF baseline. c) An example showing how Synthelite can adapt to different expert prompts. Routes are produced using .
  • Figure 3: Solve rate on a subset of 20 targets from Pistachio Reachable, with and without starting material constraint.
  • Figure 4: a) Distribution of feasibility scores for the top-3 routes across eight targets, evaluated by . b) Example illustrating the chemical feasibility awareness of the Synthelite framework, where the LLM refines its synthesis strategy across successive attempts. Example shown for target Synthelite 2, with routes generated by .
  • Figure 5: The effect of multi-attempt Phase 1 and the Phase 2's MCTS on the solve rate of Synthelite on USPTO-190.
  • ...and 17 more figures