Table of Contents
Fetching ...

A User Simulator for Task-Completion Dialogues

Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen

TL;DR

This work introduces a public user-simulation framework for task-oriented dialogue, targeting the movie-booking domain and supporting movie-ticket booking and movie-seeking tasks. It blends an agenda-based, rule-driven user model with a data-driven NLG component and a joint NLU model, enabling reinforcement learning agents to train offline with a safety, controlled simulator before real-user deployment. The framework provides data and tools, including initialization via rule-based policies and an RL training loop using DQN with experience replay, to facilitate rapid empirical comparisons across agents. The approach addresses data collection challenges in task-oriented dialogue and offers a practical pathway for researchers to develop, test, and benchmark RL-based dialogue systems in a realistic, domain-specific setting.

Abstract

Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotated data. Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge. Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues. Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator. Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online. To ease empirical algorithmic comparisons in dialogues, this paper introduces a new, publicly available simulation framework, where our simulator, designed for the movie-booking domain, leverages both rules and collected data. The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demonstrate several agents and detail the procedure to add and test your own agent in the proposed framework.

A User Simulator for Task-Completion Dialogues

TL;DR

This work introduces a public user-simulation framework for task-oriented dialogue, targeting the movie-booking domain and supporting movie-ticket booking and movie-seeking tasks. It blends an agenda-based, rule-driven user model with a data-driven NLG component and a joint NLU model, enabling reinforcement learning agents to train offline with a safety, controlled simulator before real-user deployment. The framework provides data and tools, including initialization via rule-based policies and an RL training loop using DQN with experience replay, to facilitate rapid empirical comparisons across agents. The approach addresses data collection challenges in task-oriented dialogue and offers a practical pathway for researchers to develop, test, and benchmark RL-based dialogue systems in a realistic, domain-specific setting.

Abstract

Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotated data. Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge. Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues. Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator. Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online. To ease empirical algorithmic comparisons in dialogues, this paper introduces a new, publicly available simulation framework, where our simulator, designed for the movie-booking domain, leverages both rules and collected data. The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demonstrate several agents and detail the procedure to add and test your own agent in the proposed framework.

Paper Structure

This paper contains 22 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Learning curve for policy training, without NLU and NLG: Green line is a rule agent which we employ to initialize the experience replay buffer pool; the blue line is the learning curve for the RL agent; orange line is the optimal upper bound, which is computed by the ratio of the number of reachable user goals in the database of the agent to the total number of user goals.
  • Figure 2: Learning curve for the end-to-end policy training, with NLU and NLG: Green line is a rule agent which we employ to initialize the experience replay buffer pool; the blue line is the learning curve for the RL agent; orange line is the optimal upper bound, which is computed by the ratio of the number of reachable user goals in the database of the agent to the total number of user goals.