Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding

Kexun Zhang; Hongqiao Chen; Lei Li; William Wang

Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding

Kexun Zhang, Hongqiao Chen, Lei Li, William Wang

TL;DR

The paper addresses the challenge that instruction-tuned LLMs struggle to use external tools due to complex syntax constraints. It introduces ToolDec, a decoding algorithm that enforces tool syntax via a finite-state machine automatically constructed from tool documentation and optionally complemented by prompt compression to remove syntax details from prompts. ToolDec makes the decoding process sample only from tokens permitted by the current FSM state, renormalizing probabilities with $\tilde{P}(x_t=a|x_{1..t-1},s)$ whenever a transition is defined, thereby guaranteeing syntax-error-free tool calls. Empirically, ToolDec improves performance across five base LLMs on four benchmarks, eliminates all syntax errors, and enables generalist models to reach or surpass specialized tool-use baselines, highlighting a practical, model-agnostic alternative to fine-tuning for tool use. The work suggests that symbolic, FSM-based constraints can complement or even replace data-intensive customization, with implications for safety and robustness in tool-enabled AI systems.

Abstract

Instruction-tuned large language models (LLMs) excel at many tasks but often fail to use external tools due to complicated and unfamiliar syntax constraints. While extensive fine-tuning and prompting can mitigate the issue, these approaches are expensive and hard to generalize. Furthermore, because syntax constraints are only learned implicitly during fine-tuning, models still make frequent syntax errors. Motivated by the fact that these constraints can be better satisfied explicitly with constrained decoding, we propose TOOLDEC, a decoding algorithm using finite state machines to force LLMs to follow tool syntax. Our experiments show that TOOLDEC eliminates all syntax errors, achieving significantly better performance on various base models and benchmarks. More surprisingly, when applied to generalist out-of-the-box LLMs such as Mistral-Instruct, TOOLDEC improves its accuracy in tool use from the initial 0% to an impressive 52%, matching the performance of specialized fine-tuned models such as ToolLLM.

Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding

TL;DR

whenever a transition is defined, thereby guaranteeing syntax-error-free tool calls. Empirically, ToolDec improves performance across five base LLMs on four benchmarks, eliminates all syntax errors, and enables generalist models to reach or surpass specialized tool-use baselines, highlighting a practical, model-agnostic alternative to fine-tuning for tool use. The work suggests that symbolic, FSM-based constraints can complement or even replace data-intensive customization, with implications for safety and robustness in tool-enabled AI systems.

Abstract

Paper Structure (17 sections, 1 equation, 7 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 1 equation, 7 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Proposed Method: ToolDec
Construction of ToolDec FSM
Prompt Compression
Inferencing with FSM and Compressed Prompt
Experiment
Base LLMs
Benchmarks and Metrics
Results
Ablation Study
Conclusion
Appendix
Examples of Tool Syntax Errors
Algorithm Pseudocode
...and 2 more sections

Figures (7)

Figure 1: On various benchmarks, ToolDec improves both fine-tuned specialist models (ToolLLM) and generalist models (Mistral-Instruct and Vicuna). Mistral-Instruct is improved from an initial 0% to be even better than ToolLLM. ToolDec also eliminates all syntax errors.
Figure 2: Common syntactical modes of failure of tool-use LLMs include reasoning format error, tool name error, and tool argument error. Even fine-tuned models have a significant level of syntax error.
Figure 3: Converting a tool documentation to a simplified prompt and an FSM. The caption on each state in the FSM denotes the valid token set at that step. FSM will transition to the corresponding state when the token on that edge is generated by the LLM.
Figure 4: Linking multiple FSMs to guide the LLM through reasoning and tool selection. After a tool is selected, the FSM then transitions to start generating arguments.
Figure 5: A decoding step using ToolDec FSM. The invalid tokens at the current FSM state are masked out from the token probabilities.
...and 2 more figures

Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding

TL;DR

Abstract

Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (7)