Table of Contents
Fetching ...

ChemHTS: Hierarchical Tool Stacking for Enhancing Chemical Agents

Zhucong Li, Jin Xiao, Bowei Zhang, Zhijian Zhou, Qianyu He, Fenglei Cao, Jiaqing Liang, Yuan Qi

TL;DR

This work tackles the bottleneck of single-tool usage and poor cross-tool collaboration in tool-augmented LLMs for chemistry. It introduces ChemHTS, a hierarchical tool-stacking framework with a self-stacking warmup stage and a multi-layer optimization stage to discover optimal tool invocation pathways for diverse chemical tasks. Across four classic chemistry tasks, ChemHTS-based stacking agents outperform strong baselines and provide interpretable tool-usage patterns, demonstrating the value of tool collaboration. The authors release dataset and code publicly, enabling reproducibility and further exploration of hierarchical tool stacking in scientific AI.

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in scientific research, particularly in chemistry-related tasks such as molecular design, reaction prediction, and property estimation. While tool-augmented LLMs have been introduced to enhance reasoning and computation in these domains, existing approaches suffer from tool invocation errors and lack effective collaboration among diverse tools, limiting their overall performance. To address these challenges, we propose ChemHTS (Chemical Hierarchical Tool Stacking), a novel method that optimizes tool invocation pathways through a hierarchical stacking strategy. ChemHTS consists of two key stages: tool self-stacking warmup and multi-layer decision optimization, enabling LLMs to refine tool usage dynamically. We evaluate ChemHTS across four classical chemistry tasks and demonstrate its superiority over strong baselines, including GPT-4o, DeepSeek-R1, and chemistry-specific models, including ChemDFM. Furthermore, we define four distinct tool-stacking behaviors to enhance interpretability, providing insights into the effectiveness of tool collaboration. Our dataset and code are publicly available at \url{https://github.com/Chang-pw/ChemHTS}.

ChemHTS: Hierarchical Tool Stacking for Enhancing Chemical Agents

TL;DR

This work tackles the bottleneck of single-tool usage and poor cross-tool collaboration in tool-augmented LLMs for chemistry. It introduces ChemHTS, a hierarchical tool-stacking framework with a self-stacking warmup stage and a multi-layer optimization stage to discover optimal tool invocation pathways for diverse chemical tasks. Across four classic chemistry tasks, ChemHTS-based stacking agents outperform strong baselines and provide interpretable tool-usage patterns, demonstrating the value of tool collaboration. The authors release dataset and code publicly, enabling reproducibility and further exploration of hierarchical tool stacking in scientific AI.

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in scientific research, particularly in chemistry-related tasks such as molecular design, reaction prediction, and property estimation. While tool-augmented LLMs have been introduced to enhance reasoning and computation in these domains, existing approaches suffer from tool invocation errors and lack effective collaboration among diverse tools, limiting their overall performance. To address these challenges, we propose ChemHTS (Chemical Hierarchical Tool Stacking), a novel method that optimizes tool invocation pathways through a hierarchical stacking strategy. ChemHTS consists of two key stages: tool self-stacking warmup and multi-layer decision optimization, enabling LLMs to refine tool usage dynamically. We evaluate ChemHTS across four classical chemistry tasks and demonstrate its superiority over strong baselines, including GPT-4o, DeepSeek-R1, and chemistry-specific models, including ChemDFM. Furthermore, we define four distinct tool-stacking behaviors to enhance interpretability, providing insights into the effectiveness of tool collaboration. Our dataset and code are publicly available at \url{https://github.com/Chang-pw/ChemHTS}.

Paper Structure

This paper contains 57 sections, 11 figures, 11 tables, 1 algorithm.

Figures (11)

  • Figure 1: Using the text-based molecule design task as an example, this analysis examines the issues in model tool usage under Name2SMILES and ChemDFM tools.
  • Figure 2: Our ChemHTS method framework diagram. For each chemical task, we identify the optimal tool-stacking pathway through the ChemHTS method for subsequent task inference execution.
  • Figure 3: Performance comparison of 6 multi-agent systems with different communication structures and our optimal stacking agent path on the text-based molecule design task.
  • Figure 4: The example of the Naming rule, where the icon ""," ", "" represent the Agent and the icon '','' represent the Retrieve tool and Compute tool respectively.
  • Figure 5: Demonstration of chain structure
  • ...and 6 more figures