Conversational Tree Search: A New Hybrid Dialog Task

Dirk Väth; Lindsey Vanderlyn; Ngoc Thang Vu

Conversational Tree Search: A New Hybrid Dialog Task

Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

TL;DR

This work introduces Conversational Tree Search (CTS), a hybrid task that merges FAQ-style retrieval with task-oriented dialog by enabling domain experts to craft dialog trees while ML policies navigate them efficiently. It defines a formal CTS framework, collects REIMBURSE—a German travel-reimbursement CTS dataset with 123-node trees and 452 user questions—and develops a domain-agnostic user simulator. A novel RL architecture based on a re-parameterized DDQN, augmented with Munchausen RL, Double Q-learning, and hindsight experience replay, learns to differentiate between guided and free dialog modes and to ask only necessary questions, outperforming a strong baseline, even under noisy and unseen utterances. The results demonstrate robust text understanding, generalization to new data, and practical potential for hybrid, controllable dialog systems in real-world domains. Overall, CTS offers a scalable, interpretable path to combine expert-designed dialog structure with data-driven optimization to improve goal completion while reducing unnecessary questions, with broad implications for information-seeking applications in low-resource settings.

Abstract

Conversational interfaces provide a flexible and easy way for users to seek information that may otherwise be difficult or inconvenient to obtain. However, existing interfaces generally fall into one of two categories: FAQs, where users must have a concrete question in order to retrieve a general answer, or dialogs, where users must follow a predefined path but may receive a personalized answer. In this paper, we introduce Conversational Tree Search (CTS) as a new task that bridges the gap between FAQ-style information retrieval and task-oriented dialog, allowing domain-experts to define dialog trees which can then be converted to an efficient dialog policy that learns only to ask the questions necessary to navigate a user to their goal. We collect a dataset for the travel reimbursement domain and demonstrate a baseline as well as a novel deep Reinforcement Learning architecture for this task. Our results show that the new architecture combines the positive aspects of both the FAQ and dialog system used in the baseline and achieves higher goal completion while skipping unnecessary questions.

Conversational Tree Search: A New Hybrid Dialog Task

TL;DR

Abstract

Paper Structure (46 sections, 7 figures, 6 tables, 1 algorithm)

This paper contains 46 sections, 7 figures, 6 tables, 1 algorithm.

Introduction
Definition of Conversational Tree Search Task
Formal Task Description
Guided Dialog
Free Dialog
Evaluation Objectives
Methods
User Simulator
Baseline
Reinforcement Learning Model
REIMBURSE: A Dataset for CTS
Dialog Tree
User Utterances
Experimental Setup
RQ1: Task Performance
...and 31 more sections

Figures (7)

Figure 1: An example of the proposed task: Slice of a dialog tree (blue/gray nodes, black edges) showing how progressively more concrete questions could be answered. Question a) guiding a user with a general goal through the tree, b) asking only at nodes that need more clarification, and c) requiring no clarification and thus receiving a direct answer.
Figure 2: Example graph with dialog actions and path $p$ from start node $s$ to goal node $g$. As $ASK-g \in p$, this represents a dialog where the user reached their goal.
Figure 3: Baseline architecture: Combines an FAQ retrieval system (free mode) and a handcrafted dialog system (guided mode), with a classifier deciding which policy is active based on the input in the first turn.
Figure 4: Proposed RL architecture: The state-value function $V(\mathbf{s})$ for state $\mathbf{s}$ is calculated from the shared layers (b), independent of action inputs. For the advantage function $A(\mathbf{s},a)$, the outputs from shared layers (b) and action-specific layers (c) are combined. The final state-action-values $Q$ are obtained by performing one forward pass per action $a_i$, each yielding a scalar $q_{\mathbf{s},a_i}$, which are then concatenated by state into a vector with one value per action. Additionally, a binary dialog mode classifier is added on top of the shared layers (b).
Figure 5: Task performance for different noise levels on user input (drawn from a normal distribution around the original text encoding vector $\textbf{u}$, using a percentage $n$ of $\mathbf{u}$ as standard deviation: $\mathcal{N}(\mathbf{u}, n |\textbf{u}|)$.
...and 2 more figures

Conversational Tree Search: A New Hybrid Dialog Task

TL;DR

Abstract

Conversational Tree Search: A New Hybrid Dialog Task

Authors

TL;DR

Abstract

Table of Contents

Figures (7)