Multi-Field Tool Retrieval
Yichen Tang, Weihang Su, Yiqun Liu, Qingyao Ai
TL;DR
This work tackles the core bottleneck of tool retrieval for LLM-enabled agents by arguing that tool utility is multi-faceted and cannot be captured by treating tool documentation as flat text. It introduces Multi-Field Tool Retrieval (MFTR), which standardizes tool docs into four fields, rewrites user queries to align with these fields, and adaptively weights field-level relevance while penalizing missing parameters to enforce executability. Through extensive experiments on five tool-retrieval benchmarks and a mixed large-scale benchmark, MFTR achieves state-of-the-art results and demonstrates strong generalization across diverse retrievers and datasets. The findings show that fine-grained, multi-field relevance modeling significantly improves tool selection accuracy and robustness in real-world, heterogeneous tool repositories, enabling more reliable tool-using agents.
Abstract
Integrating external tools enables Large Language Models (LLMs) to interact with real-world environments and solve complex tasks. Given the growing scale of available tools, effective tool retrieval is essential to mitigate constraints of LLMs' context windows and ensure computational efficiency. Existing approaches typically treat tool retrieval as a traditional ad-hoc retrieval task, matching user queries against the entire raw tool documentation. In this paper, we identify three fundamental challenges that limit the effectiveness of this paradigm: (i) the incompleteness and structural inconsistency of tool documentation; (ii) the significant semantic and granular mismatch between user queries and technical tool documents; and, most importantly, (iii) the multi-aspect nature of tool utility, that involves distinct dimensions, such as functionality, input constraints, and output formats, varying in format and importance. To address these challenges, we introduce Multi-Field Tool Retrieval, a framework designed to align user intent with tool representations through fine-grained, multi-field modeling. Experimental results show that our framework achieves SOTA performance on five datasets and a mixed benchmark, exhibiting superior generalizability and robustness.
