RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation
Xiao Liu, Da Yin, Zirui Wu, Yansong Feng
TL;DR
This work introduces RefTool, a two-stage framework that (1) constructs executable tools from structured external references (e.g., textbooks) and (2) uses a hierarchical toolbox to select and apply these tools for solving problems. By grounding tool creation and tool selection in reference material, RefTool extends LLM reasoning beyond internal knowledge, enabling robust performance across causality, physics, and chemistry benchmarks. Empirical results show RefTool achieves an average accuracy improvement of 11.3% over baselines, with substantial cost savings and strong tool reusability, while human evaluation confirms high tool quality and alignment with expert judgments. The study demonstrates the value of external-reference grounding for generalizable, dataset-agnostic tool creation and demonstrates practical gains in complex, knowledge-intensive reasoning tasks.
Abstract
Tools enhance the reasoning capabilities of large language models (LLMs) in complex problem-solving tasks, but not all tasks have available tools. In the absence of predefined tools, prior works have explored instructing LLMs to generate tools on their own. However, such approaches rely heavily on the models' internal knowledge and would fail in domains beyond the LLMs' knowledge scope. To address this limitation, we propose RefTool, a reference-guided framework for automatic tool creation that leverages structured external materials such as textbooks. RefTool consists of two modules: (1) tool creation, where LLMs generate executable tools from reference content, validate them using illustrative examples, and organize them hierarchically into a toolbox; and (2) tool utilization, where LLMs navigate the toolbox structure to select and apply the appropriate tools to solve problems. Experiments on causality, physics, and chemistry benchmarks demonstrate that RefTool outperforms existing tool-creation and domain-specific reasoning methods by 11.3% on average accuracy, while being cost-efficient and broadly generalizable. Analyses reveal that grounding tool creation in references produces accurate and faithful tools, and that the hierarchical structure facilitates effective tool selection. RefTool enables LLMs to overcome knowledge limitations, demonstrating the value of grounding tool creation in external references for enhanced and generalizable reasoning.
