LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems
Jisoo Mok, Mohammad Kachuee, Shuyang Dai, Shayan Ray, Tara Taghavi, Sungroh Yoon
TL;DR
This work investigates using Large Language Models (LLMs) for API argument filling in task-oriented dialog systems and identifies grounding to API schemas and dialogue history as essential to prevent syntax errors and hallucinations. It proposes a dual grounding strategy: (i) an instruction-tuning pipeline with supervised fine-tuning plus rejection sampling for open-source LLMs, with a reward $R = 1 - 2 \cdot N_{\mathrm{Error}} / N_{\mathrm{Total}}$ and $N_{\mathrm{Error}} = N_{\mathrm NK} + N_{\mathrm MK} + N_{\mathrm SV} + N_{\mathrm HV}$ to select high-quality samples, and (ii) a multi-step prompting scheme for closed-source LLMs that fills one argument at a time. Empirical results on STAR and SGD show that LLAMA-v1-7B fine-tuned with SFT+RS often outperforms larger zero-shot baselines, demonstrating that smaller, well-grounded models can achieve strong argument-filling performance. The findings highlight the practical viability of automated, robust argument filling in production task-oriented agents and point to scalable directions for grounding-based LLM deployment. $R$-based rejection sampling and slot-wise prompting contribute to robustness and generalization in both open- and closed-model regimes.
Abstract
Task-orientated conversational agents interact with users and assist them via leveraging external APIs. A typical task-oriented conversational system can be broken down into three phases: external API selection, argument filling, and response generation. The focus of our work is the task of argument filling, which is in charge of accurately providing arguments required by the selected API. Upon comprehending the dialogue history and the pre-defined API schema, the argument filling task is expected to provide the external API with the necessary information to generate a desirable agent action. In this paper, we study the application of Large Language Models (LLMs) for the problem of API argument filling task. Our initial investigation reveals that LLMs require an additional grounding process to successfully perform argument filling, inspiring us to design training and prompting frameworks to ground their responses. Our experimental results demonstrate that when paired with proposed techniques, the argument filling performance of LLMs noticeably improves, paving a new way toward building an automated argument filling framework.
