LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems

Jisoo Mok; Mohammad Kachuee; Shuyang Dai; Shayan Ray; Tara Taghavi; Sungroh Yoon

LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems

Jisoo Mok, Mohammad Kachuee, Shuyang Dai, Shayan Ray, Tara Taghavi, Sungroh Yoon

TL;DR

This work investigates using Large Language Models (LLMs) for API argument filling in task-oriented dialog systems and identifies grounding to API schemas and dialogue history as essential to prevent syntax errors and hallucinations. It proposes a dual grounding strategy: (i) an instruction-tuning pipeline with supervised fine-tuning plus rejection sampling for open-source LLMs, with a reward $R = 1 - 2 \cdot N_{\mathrm{Error}} / N_{\mathrm{Total}}$ and $N_{\mathrm{Error}} = N_{\mathrm NK} + N_{\mathrm MK} + N_{\mathrm SV} + N_{\mathrm HV}$ to select high-quality samples, and (ii) a multi-step prompting scheme for closed-source LLMs that fills one argument at a time. Empirical results on STAR and SGD show that LLAMA-v1-7B fine-tuned with SFT+RS often outperforms larger zero-shot baselines, demonstrating that smaller, well-grounded models can achieve strong argument-filling performance. The findings highlight the practical viability of automated, robust argument filling in production task-oriented agents and point to scalable directions for grounding-based LLM deployment. $R$-based rejection sampling and slot-wise prompting contribute to robustness and generalization in both open- and closed-model regimes.

Abstract

Task-orientated conversational agents interact with users and assist them via leveraging external APIs. A typical task-oriented conversational system can be broken down into three phases: external API selection, argument filling, and response generation. The focus of our work is the task of argument filling, which is in charge of accurately providing arguments required by the selected API. Upon comprehending the dialogue history and the pre-defined API schema, the argument filling task is expected to provide the external API with the necessary information to generate a desirable agent action. In this paper, we study the application of Large Language Models (LLMs) for the problem of API argument filling task. Our initial investigation reveals that LLMs require an additional grounding process to successfully perform argument filling, inspiring us to design training and prompting frameworks to ground their responses. Our experimental results demonstrate that when paired with proposed techniques, the argument filling performance of LLMs noticeably improves, paving a new way toward building an automated argument filling framework.

LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems

TL;DR

and

to select high-quality samples, and (ii) a multi-step prompting scheme for closed-source LLMs that fills one argument at a time. Empirical results on STAR and SGD show that LLAMA-v1-7B fine-tuned with SFT+RS often outperforms larger zero-shot baselines, demonstrating that smaller, well-grounded models can achieve strong argument-filling performance. The findings highlight the practical viability of automated, robust argument filling in production task-oriented agents and point to scalable directions for grounding-based LLM deployment.

-based rejection sampling and slot-wise prompting contribute to robustness and generalization in both open- and closed-model regimes.

Abstract

Paper Structure (20 sections, 5 figures, 2 tables)

This paper contains 20 sections, 5 figures, 2 tables.

Introduction
Related Works
Language Models for Task-oriented Dialogues
Large Language Models and Instruction-tuning
Proposed Methodology
Prompt Design
Instruction-tuning Framework for Open-sourced LLMs
Multi-step Prompting Scheme for Closed-sourced LLMs
Experimental Set-up
Datasets and Models
Datasets
Models
Libraries and Hyperparameters
Compared Approaches
Metrics
...and 5 more sections

Figures (5)

Figure 1: An overview of how a task-oriented conversational agent might complete a user's request to book a haircut. To begin with, the agent selects the "Hair Appointment" API from the list of available APIs. An example of the pre-defined API schema associated with the "Hair Appointment" API is given on the far right side. Following API selection, the argument filling step utilizes the API schema and dialogue history to identify arguments to complete the API call. Finally, the agent responds to the user with the utterances produced in the response generation step.
Figure 2: Examples of two potential errors that can arise in argument filling. (a) Syntax errors refer to those that digress away from the pre-defined API schema. (b) Hallucinations correspond to those that deviate from the user intention and utterances.
Figure 3: Abbreviated illustration of the default prompt template that includes API description and dialogue history. We also provide an example of a ground-truth argument, which is pre-processed to follow a dictionary-like format.
Figure 4: Analyses of four different error rates on (a) STAR and (b) SGD in-domain benchmarks.
Figure 5: Analyses of four different error rates on (a) STAR and (b) SGD out-of-domain benchmarks.

LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems

TL;DR

Abstract

LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)