Table of Contents
Fetching ...

From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning

Cheng Yang, Jiaxuan Lu, Haiyuan Wan, Junchi Yu, Feiwei Qin

Abstract

The chemical reaction recommendation is to select proper reaction condition parameters for chemical reactions, which is pivotal to accelerating chemical science. With the rapid development of large language models (LLMs), there is growing interest in leveraging their reasoning and planning capabilities for reaction condition recommendation. Despite their success, existing methods rarely explain the rationale behind the recommended reaction conditions, limiting their utility in high-stakes scientific workflows. In this work, we propose ChemMAS, a multi-agent system that reframes condition prediction as an evidence-based reasoning task. ChemMAS decomposes the task into mechanistic grounding, multi-channel recall, constraint-aware agentic debate, and rationale aggregation. Each decision is backed by interpretable justifications grounded in chemical knowledge and retrieved precedents. Experiments show that ChemMAS achieves 20-35% gains over domain-specific baselines and outperforms general-purpose LLMs by 10-15% in Top-1 accuracy, while offering falsifiable, human-trustable rationales, which establishes a new paradigm for explainable AI in scientific discovery.

From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning

Abstract

The chemical reaction recommendation is to select proper reaction condition parameters for chemical reactions, which is pivotal to accelerating chemical science. With the rapid development of large language models (LLMs), there is growing interest in leveraging their reasoning and planning capabilities for reaction condition recommendation. Despite their success, existing methods rarely explain the rationale behind the recommended reaction conditions, limiting their utility in high-stakes scientific workflows. In this work, we propose ChemMAS, a multi-agent system that reframes condition prediction as an evidence-based reasoning task. ChemMAS decomposes the task into mechanistic grounding, multi-channel recall, constraint-aware agentic debate, and rationale aggregation. Each decision is backed by interpretable justifications grounded in chemical knowledge and retrieved precedents. Experiments show that ChemMAS achieves 20-35% gains over domain-specific baselines and outperforms general-purpose LLMs by 10-15% in Top-1 accuracy, while offering falsifiable, human-trustable rationales, which establishes a new paradigm for explainable AI in scientific discovery.

Paper Structure

This paper contains 61 sections, 20 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overview of ChemMAS. A collaborative multi-agent system for evidence-based reaction-condition reasoning from SMILES inputs. ChemMAS demonstrates strong versatility and delivers state-of-the-art performance on reaction condition reasoning.
  • Figure 2: Architecture of ChemMAS. The left side shows how the General Chemist processes SMILES and Multi-Channel Recall retrieves reaction conditions from the Reaction Base. On the right, candidate conditions are paired and evaluated through Multi-Agent Debate, where four agents with Multi-Step Reasoning select the top-50 conditions via Tournament Selection.
  • Figure 3: Two-stage Multi-tool Collaborative Training Framework of ChemMAS. Chemical Teaching uses SFT for cold-start training, enabling the LLM to master TIR, and Tool Incentivization employs RL to align the model’s policy with both answer correctness and collaborative tool usage.
  • Figure 4: Model Interpretability Evaluation and Scoring Methodology. (Left) Accuracy of ChemMAS outputs compared to human expert annotations. (Center) Human alignment performance comparison; blue bars indicate LLM-Scores and green bars indicate BLEU-4 scores. (Right) Schematic representation of the LLM-Score pipeline and the question-answering based evaluation workflow.
  • Figure 5: Multi-agent ablation: Top-1 similarity improvements across Catalyst, Solvent1/2, and Reagent1/2 when adding specialized agents on top of $\mathcal{A}_{Gen}$+$\mathcal{A}_{Full}$.
  • ...and 5 more figures