Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Kexin Chen; Jiamin Lu; Junyou Li; Xiaoran Yang; Yuyang Du; Kunyi Wang; Qiannuan Shi; Jiahui Yu; Lanqing Li; Jiezhong Qiu; Jianzhang Pan; Yi Huang; Qun Fang; Pheng Ann Heng; Guangyong Chen

Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Kexin Chen, Jiamin Lu, Junyou Li, Xiaoran Yang, Yuyang Du, Kunyi Wang, Qiannuan Shi, Jiahui Yu, Lanqing Li, Jiezhong Qiu, Jianzhang Pan, Yi Huang, Qun Fang, Pheng Ann Heng, Guangyong Chen

TL;DR

Chemist-X tackles the challenge of automatically optimizing reaction conditions by combining retrieval-augmented generation with up-to-date online chemical knowledge, CAD-enabled condition design, and autonomous wet-lab execution. The three-phase workflow retrieves relevant literature and molecular data, designs promising conditions via a CL-SCL–based fingerprint and CAD tools, and validates them through LLM-guided robotic experiments. Results show strong performance in Phase One code generation, superior fingerprint-based optimization in Phase Two, and autonomous Suzuki reactions achieving high yields in Phase Three, highlighting a viable path to AI-supervised chemical synthesis. This work advances automated synthesis by integrating real-time knowledge access, advanced reaction representations, and end-to-end robotic experimentation.

Abstract

Recent AI research plots a promising future of automatic chemical reactions within the chemistry society. This study proposes Chemist-X, a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis with retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions. To begin with, as an emulation on how chemical experts solve the RCO task, Chemist-X utilizes a novel RAG scheme to interrogate available molecular and literature databases to narrow the searching space for later processing. The agent then leverages a computer-aided design (CAD) tool we have developed through a large language model (LLM) supervised programming interface. With updated chemical knowledge obtained via RAG, as well as the ability in using CAD tools, our agent significantly outperforms conventional RCO AIs confined to the fixed knowledge within its training data. Finally, Chemist-X interacts with the physical world through an automated robotic system, which can validate the suggested chemical reaction condition without human interventions. The control of the robotic system was achieved with a novel algorithm we have developed for the equipment, which relies on LLMs for reliable script generation. Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.

Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 4 figures, 2 tables)

This paper contains 15 sections, 7 equations, 4 figures, 2 tables.

Introduction
Results and Discussions
Phase One Unit Tests
Phase Two Unit Tests
Phase Three Wet-Lab Validations
Experimental Setup
Experimental Details and Discussions
Method
Phase One: Information Retrieval Using Molecule and Literature Databases
Phase Two: Reaction Condition Optimization with Pre-packaged Fingerprint Tool
Reaction Data Processing
The CL-SCL Network
Phase Three: Automated Laboratory Execution with LLM-Supervised Interface Control
Conclusion
Extended Data

Figures (4)

Figure 1: The three-phase RCO framework of Chemist-X, in which all phases are automatically executed under the control of the LLM agent.
Figure 2: Experimental details of Phase One's unit test. a) The prompt we used for the auto code generation in this experiment. Here $<...>$ is a placeholder representing the flexible part of the prompt to be selected with the TMS scheme. b) and c) Similarity scores between each documentation slice and the fixed part of our prompt. The scores in b) and c) are calculated with cosine distance and L2 distance, respectively. d) to h) Compression of four different prompting schemes (zero-shot/all-document/random-slice/TMS prompting) in terms of cost and performance.
Figure 3: Overview and procedure of the studied Suzuki-Miyaura reactions. a) The chemical space of the studied Suzuki-Miyaura reaction. b) Structure of phosphorus ligands shown in a). c) Unchained Lab in the IC platform, equipped with modules for storage of solid, liquid reagents and solvents, pipettes for viscous or trace liquid reagents, vials for different experiments, an analytical balance and reactors (OSR). d) Reactors (OSR) of the Unchained Lab with inert gas protection. e) The orbital robot and on-island robotic arm of the IC platform for achieving automated transfer operations. f) Identification of the product and yield analysis with the HPLC-MS and HPLC systems. g) The calculation of the Relative Response Factor with the internal standard method. h) Yield analysis of each reaction with HPLC.
Figure 4: Technical details about the LLM-supervised control script generation for wet-lab equipments. Figure a) presents our prompting framework. Figure b) gives the ablation study result for the mouse clicking algorithm. Figure c) shows the parameter sensitivity of the LLM-supervised system.

Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

TL;DR

Abstract

Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (4)