Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen, Lu Chen, Kai Yu
TL;DR
This work tackles when LLMs should call external tools by modeling knowledge boundaries as a probabilistic, uncertain region rather than a binary known/unknown state. It introduces a multi-objective alignment framework that balances helpfulness with tool cost, supported by two knowledge-boundary estimation methods (consistency-based and absolute) and two training strategies (implicit and explicit modeling). Empirical results across calculator, retrieval-based QA, and complex reasoning tasks show substantial reductions in unnecessary tool usage while maintaining or improving accuracy, with explicit modeling offering flexible at-inference control. The framework advances practical tool intelligence by enabling dynamic, cost-aware tool invocation suitable for real-world deployment.
Abstract
Recent advancements in tool learning have enabled large language models (LLMs) to integrate external tools, enhancing their task performance by expanding their knowledge boundaries. However, relying on tools often introduces tradeoffs between performance, speed, and cost, with LLMs sometimes exhibiting overreliance and overconfidence in tool usage. This paper addresses the challenge of aligning LLMs with their knowledge boundaries to make more intelligent decisions about tool invocation. We propose a multi objective alignment framework that combines probabilistic knowledge boundary estimation with dynamic decision making, allowing LLMs to better assess when to invoke tools based on their confidence. Our framework includes two methods for knowledge boundary estimation, consistency based and absolute estimation, and two training strategies for integrating these estimates into the model decision making process. Experimental results on various tool invocation scenarios demonstrate the effectiveness of our framework, showing significant improvements in tool efficiency by reducing unnecessary tool usage.
