ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

Botao Yu; Frazier N. Baker; Ziru Chen; Garrett Herb; Boyu Gou; Daniel Adu-Ampratwum; Xia Ning; Huan Sun

ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

Botao Yu, Frazier N. Baker, Ziru Chen, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun

TL;DR

ChemToolAgent investigates whether expanding LLMs with domain-specific tools improves chemistry problem solving. Built on the ReAct framework, CTA uses 29 tools to handle a broad spectrum of tasks, and is evaluated on specialized datasets (e.g., SMolInstruct) and general chemistry benchmarks (MMLU-Chemistry, SciBench-Chemistry, GPQA-Chemistry). Results show substantial gains for specialized, tool-heavy tasks but no consistent advantage over base LLMs on general questions, indicating a nuanced trade-off where tool augmentation helps certain domains but can impede broad reasoning. The study further analyzes error types, revealing that tool-related failures and cognitive load contribute to the mixed performance, guiding future work toward better tool design, reasoning verification, and load management with multi-agent or information-verification strategies.

Abstract

To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemToolAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemToolAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents' ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.

ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

TL;DR

Abstract

ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)