Conversational Tree Search: A New Hybrid Dialog Task
Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu
TL;DR
This work introduces Conversational Tree Search (CTS), a hybrid task that merges FAQ-style retrieval with task-oriented dialog by enabling domain experts to craft dialog trees while ML policies navigate them efficiently. It defines a formal CTS framework, collects REIMBURSE—a German travel-reimbursement CTS dataset with 123-node trees and 452 user questions—and develops a domain-agnostic user simulator. A novel RL architecture based on a re-parameterized DDQN, augmented with Munchausen RL, Double Q-learning, and hindsight experience replay, learns to differentiate between guided and free dialog modes and to ask only necessary questions, outperforming a strong baseline, even under noisy and unseen utterances. The results demonstrate robust text understanding, generalization to new data, and practical potential for hybrid, controllable dialog systems in real-world domains. Overall, CTS offers a scalable, interpretable path to combine expert-designed dialog structure with data-driven optimization to improve goal completion while reducing unnecessary questions, with broad implications for information-seeking applications in low-resource settings.
Abstract
Conversational interfaces provide a flexible and easy way for users to seek information that may otherwise be difficult or inconvenient to obtain. However, existing interfaces generally fall into one of two categories: FAQs, where users must have a concrete question in order to retrieve a general answer, or dialogs, where users must follow a predefined path but may receive a personalized answer. In this paper, we introduce Conversational Tree Search (CTS) as a new task that bridges the gap between FAQ-style information retrieval and task-oriented dialog, allowing domain-experts to define dialog trees which can then be converted to an efficient dialog policy that learns only to ask the questions necessary to navigate a user to their goal. We collect a dataset for the travel reimbursement domain and demonstrate a baseline as well as a novel deep Reinforcement Learning architecture for this task. Our results show that the new architecture combines the positive aspects of both the FAQ and dialog system used in the baseline and achieves higher goal completion while skipping unnecessary questions.
