Table of Contents
Fetching ...

FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking

Zhuoer Wang, Leonardo F. R. Ribeiro, Alexandros Papangelis, Rohan Mukherjee, Tzu-Yen Wang, Xinyan Zhao, Arijit Biswas, James Caverlee, Angeliki Metallinou

TL;DR

This paper tackles the problem of generating faithful API calls with large language models, where prior methods struggle with data efficiency and alignment to API documentation. It introduces FANTASE, a two-component framework: State-tracked Constrained Decoding (SCD) that enforces API constraints via a Constrained Token Search Trie during decoding, and a lightweight Reranking module that incorporates supervised signals without large-scale fine-tuning. Experiments on DSTC8 and API Bank show that SCD substantially improves API call accuracy and inference speed, while the RoBERTa-based reranker provides additional gains, leading to state-of-the-art results among smaller backbones and strong performance versus larger models like GPT-4. The approach reduces dependence on extensive in-context prompts and demonstrates practical improvements in faithfulness, efficiency, and context usage for tool-using LLMs, with implications for real-world API integration and safer deployment.

Abstract

API call generation is the cornerstone of large language models' tool-using ability that provides access to the larger world. However, existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. To address these limitations, we propose an output-side optimization approach called FANTASE. Two of the unique contributions of FANTASE are its State-Tracked Constrained Decoding (SCD) and Reranking components. SCD dynamically incorporates appropriate API constraints in the form of Token Search Trie for efficient and guaranteed generation faithfulness with respect to the API documentation. The Reranking component efficiently brings in the supervised signal by leveraging a lightweight model as the discriminator to rerank the beam-searched candidate generations of the large language model. We demonstrate the superior performance of FANTASE in API call generation accuracy, inference efficiency, and context efficiency with DSTC8 and API Bank datasets.

FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking

TL;DR

This paper tackles the problem of generating faithful API calls with large language models, where prior methods struggle with data efficiency and alignment to API documentation. It introduces FANTASE, a two-component framework: State-tracked Constrained Decoding (SCD) that enforces API constraints via a Constrained Token Search Trie during decoding, and a lightweight Reranking module that incorporates supervised signals without large-scale fine-tuning. Experiments on DSTC8 and API Bank show that SCD substantially improves API call accuracy and inference speed, while the RoBERTa-based reranker provides additional gains, leading to state-of-the-art results among smaller backbones and strong performance versus larger models like GPT-4. The approach reduces dependence on extensive in-context prompts and demonstrates practical improvements in faithfulness, efficiency, and context usage for tool-using LLMs, with implications for real-world API integration and safer deployment.

Abstract

API call generation is the cornerstone of large language models' tool-using ability that provides access to the larger world. However, existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. To address these limitations, we propose an output-side optimization approach called FANTASE. Two of the unique contributions of FANTASE are its State-Tracked Constrained Decoding (SCD) and Reranking components. SCD dynamically incorporates appropriate API constraints in the form of Token Search Trie for efficient and guaranteed generation faithfulness with respect to the API documentation. The Reranking component efficiently brings in the supervised signal by leveraging a lightweight model as the discriminator to rerank the beam-searched candidate generations of the large language model. We demonstrate the superior performance of FANTASE in API call generation accuracy, inference efficiency, and context efficiency with DSTC8 and API Bank datasets.
Paper Structure (18 sections, 4 figures, 7 tables)

This paper contains 18 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Example of an API call that retrieves information based on the user's needs given in the conversation.
  • Figure 2: Illustration of the Concepts of Constrained Decoding and Reranking. (Upper half) Constrained Decoding enforces API documentation constraints and would only consider the five possible values of cuisine. (Lower half) A lightweight RoBERTa model is used to discriminate and rerank the beam searched candidate generations.
  • Figure 3: State-tracked Constrained Generation of API Call (showing the step of generating the value of parameter cuisine that has possible values of American, Chinese, Indian, Italian, and Mexican).
  • Figure 4: Comparison of the Generation of Restaurants_1 with Regular Decoding and Decoding with Constrained Token Search Trie.