Table of Contents
Fetching ...

Interpreting User Requests in the Context of Natural Language Standing Instructions

Nikita Moghe, Patrick Xia, Jacob Andreas, Jason Eisner, Benjamin Van Durme, Harsh Jhamtani

TL;DR

This work introduces natural language standing instructions as persistent user preferences to condition LLM-driven dialogue, formalizing the task of selecting relevant instructions and interpreting them into API calls. It presents NLSI, a 2.4k-dialogue, 17-domain dataset that categorizes instruction-utterance interactions into six reasoning types and provides a suite of baselines combining selection and interpretation using prompting, retrieval, and memory-augmented strategies. Experimental results reveal substantial challenges: even with gold-standing-instruction selection (Oracle), interpretation remains far from perfect, and the best EM across methods hovers around the mid-40s with notable variance across reasoning types. The findings highlight the need for improved retrieval, reasoning, and structured parsing techniques, as well as memory-augmented approaches, to robustly leverage standing instructions in task-oriented dialogue with diverse domains.

Abstract

Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. We describe an approach to LLM-based dialogue modeling in which persistent user constraints and preferences -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states "I'm hungry", a previously expressed preference for Persian food can be automatically added to the LLM prompt, influencing the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue. NLSI contains diverse phenomena, from simple preferences to interdependent instructions such as triggering a hotel search whenever the user is booking tickets to an event. We conduct experiments on NLSI using prompting with large language models and various retrieval approaches, achieving a maximum of 44.7% exact match on API prediction. Our results demonstrate the challenges in identifying the relevant standing instructions and their interpretation into API calls.

Interpreting User Requests in the Context of Natural Language Standing Instructions

TL;DR

This work introduces natural language standing instructions as persistent user preferences to condition LLM-driven dialogue, formalizing the task of selecting relevant instructions and interpreting them into API calls. It presents NLSI, a 2.4k-dialogue, 17-domain dataset that categorizes instruction-utterance interactions into six reasoning types and provides a suite of baselines combining selection and interpretation using prompting, retrieval, and memory-augmented strategies. Experimental results reveal substantial challenges: even with gold-standing-instruction selection (Oracle), interpretation remains far from perfect, and the best EM across methods hovers around the mid-40s with notable variance across reasoning types. The findings highlight the need for improved retrieval, reasoning, and structured parsing techniques, as well as memory-augmented approaches, to robustly leverage standing instructions in task-oriented dialogue with diverse domains.

Abstract

Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. We describe an approach to LLM-based dialogue modeling in which persistent user constraints and preferences -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states "I'm hungry", a previously expressed preference for Persian food can be automatically added to the LLM prompt, influencing the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue. NLSI contains diverse phenomena, from simple preferences to interdependent instructions such as triggering a hotel search whenever the user is booking tickets to an event. We conduct experiments on NLSI using prompting with large language models and various retrieval approaches, achieving a maximum of 44.7% exact match on API prediction. Our results demonstrate the challenges in identifying the relevant standing instructions and their interpretation into API calls.
Paper Structure (51 sections, 1 equation, 5 figures, 9 tables)

This paper contains 51 sections, 1 equation, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Parsing an utterance into a structured output, in the presence of a user-specific set of standing instructions. A model for the task needs to identify (explicitly or implicitly) the subset of instructions applicable to the utterance and interpret the utterance into API calls.
  • Figure 2: Illustration of different prompting methods. The blocks in red are the expected output generation and every other block is part of the input. The green bits are repeated $K$ times, providing $K$ demonstrations for in-context learning. $\textsc{Direct}{}$ Interpretation conditions the generation of API calls on the user profile and user utterance. $\textsc{Select-And-Interpret}{}$ requires the generation of the appropriate standing instructions based on user profile and user utterance followed by API generation. $\textsc{Select-Then-Interpret}$ receives the predicted standing instructions from a separate Selection Model (see left) in addition to the user utterance and then generates the API calls. The selection step only generates the standing instructions based on the user profile and the user utterance.
  • Figure 3: Prompt for the ICL Selection task. The number of examples and the type of examples will vary according to the experiment
  • Figure 4: Prompt used for interpretation experiments. We include the template for demonstration examples and test examples in this figure. Note the demonstration examples will be repeated based on the number of demonstration examples used
  • Figure 5: Demonstration and test example format for Select-And-Interpret experiments