Towards Zero-Shot, Controllable Dialog Planning with LLMs

Dirk Väth; Ngoc Thang Vu

Towards Zero-Shot, Controllable Dialog Planning with LLMs

Dirk Väth, Ngoc Thang Vu

TL;DR

CTS-LLM introduces a zero-shot, controllable CTS policy that guides dialog planning via domain graphs using an LLM to search and prune relevant nodes. It achieves real-time feasibility by combining an embedding-based pre-filter with LLM reasoning, avoiding training and preventing hallucination by keeping outputs within predefined templates. In simulation across three CTS domains, CTS-LLM significantly outperforms state-of-the-art RL CTS agents in task success, and user studies confirm improvements in real-world REIMBURSE-en tasks. The approach generalizes to different domain graphs and supports both large commercial and smaller open LLMs, offering scalable, zero-training dialog planning for sensitive domains.

Abstract

Recently, Large Language Models (LLMs) have emerged as an alternative to training task-specific dialog agents, due to their broad reasoning capabilities and performance in zero-shot learning scenarios. However, many LLM-based dialog systems fall short in planning towards an overarching dialog goal and therefore cannot steer the conversation appropriately. Furthermore, these models struggle with hallucination, making them unsuitable for information access in sensitive domains, such as legal or medical domains, where correctness of information given to users is critical. The recently introduced task Conversational Tree Search (CTS) proposes the use of dialog graphs to avoid hallucination in sensitive domains, however, state-of-the-art agents are Reinforcement Learning (RL) based and require long training times, despite excelling at dialog strategy. This paper introduces a novel zero-shot method for controllable CTS agents, where LLMs guide the dialog planning through domain graphs by searching and pruning relevant graph nodes based on user interaction preferences. We show that these agents significantly outperform state-of-the-art CTS agents ($p<0.0001$; Barnard Exact test) in simulation. This generalizes to all available CTS domains. Finally, we perform user evaluation to test the agent's performance in the wild, showing that our policy significantly ($p<0.05$; Barnard Exact) improves task-success compared to the state-of-the-art RL-based CTS agent.

Towards Zero-Shot, Controllable Dialog Planning with LLMs

TL;DR

Abstract

; Barnard Exact test) in simulation. This generalizes to all available CTS domains. Finally, we perform user evaluation to test the agent's performance in the wild, showing that our policy significantly (

; Barnard Exact) improves task-success compared to the state-of-the-art RL-based CTS agent.

Paper Structure (32 sections, 4 figures, 11 tables, 2 algorithms)

This paper contains 32 sections, 4 figures, 11 tables, 2 algorithms.

Introduction
Related Work
Dialog Planning
Conversational Tree Search
Zero-Shot Learning
Information Retrieval
Natural Language Understanding
Method
Dialog Policy
Navigation Module
Interaction Mode Classification
Intent Classification
Goal Node Filter
Datasets
User Study
...and 17 more sections

Figures (4)

Figure 1: To plan a dialog, the proposed agent tracks possible user goals (green: $G_1, G_2$) based on the initial user input (purple: $S$), then plans paths (blue: $S \rightarrow ... \rightarrow D \rightarrow G_{1/2}$) to reach them, identifying decision points to proactively prompt the user for more information (orange: $D$).
Figure 2: Example dialog graph and three different user inputs, leading to step-by-step conversation (a), clarification steps (b), or directly skipping to the answer (c) vathConversationalTreeSearch2023
Figure 3: Interaction between user and system. The system predicts user intent and interaction mode from the user utterance (1). If the interaction mode is free, the user utterance is used to retrieve answer candidates from the dialog graph (2a). In guided mode (2b), the policy follows the edge associated with the predicted user intent (3b). The agent may also decide to ask for clarification or output information (3a).
Figure 4: Recall of pre-filtering the possible goal nodes for different $k$ on the REIMBURSE-en train split.

Towards Zero-Shot, Controllable Dialog Planning with LLMs

TL;DR

Abstract

Towards Zero-Shot, Controllable Dialog Planning with LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)