Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
Kexun Zhang, Hongqiao Chen, Lei Li, William Wang
TL;DR
The paper addresses the challenge that instruction-tuned LLMs struggle to use external tools due to complex syntax constraints. It introduces ToolDec, a decoding algorithm that enforces tool syntax via a finite-state machine automatically constructed from tool documentation and optionally complemented by prompt compression to remove syntax details from prompts. ToolDec makes the decoding process sample only from tokens permitted by the current FSM state, renormalizing probabilities with $\tilde{P}(x_t=a|x_{1..t-1},s)$ whenever a transition is defined, thereby guaranteeing syntax-error-free tool calls. Empirically, ToolDec improves performance across five base LLMs on four benchmarks, eliminates all syntax errors, and enables generalist models to reach or surpass specialized tool-use baselines, highlighting a practical, model-agnostic alternative to fine-tuning for tool use. The work suggests that symbolic, FSM-based constraints can complement or even replace data-intensive customization, with implications for safety and robustness in tool-enabled AI systems.
Abstract
Instruction-tuned large language models (LLMs) excel at many tasks but often fail to use external tools due to complicated and unfamiliar syntax constraints. While extensive fine-tuning and prompting can mitigate the issue, these approaches are expensive and hard to generalize. Furthermore, because syntax constraints are only learned implicitly during fine-tuning, models still make frequent syntax errors. Motivated by the fact that these constraints can be better satisfied explicitly with constrained decoding, we propose TOOLDEC, a decoding algorithm using finite state machines to force LLMs to follow tool syntax. Our experiments show that TOOLDEC eliminates all syntax errors, achieving significantly better performance on various base models and benchmarks. More surprisingly, when applied to generalist out-of-the-box LLMs such as Mistral-Instruct, TOOLDEC improves its accuracy in tool use from the initial 0% to an impressive 52%, matching the performance of specialized fine-tuned models such as ToolLLM.
