Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems
Xiujun Li, Yu Wang, Siqi Sun, Sarah Panda, Jingjing Liu, Jianfeng Gao
TL;DR
The paper proposes a Dialogue Challenge to advance end-to-end task-completion dialogue systems by releasing a multi-domain, human-annotated dataset and a unified experimental platform with domain-specific simulators. This framework enables end-to-end development and evaluation across modules, using both simulated and human assessments to benchmark progress. Key contributions include datasets for movie-ticket booking, restaurant reservation, and taxi ordering, accompanying knowledge bases and a configurable user simulator, plus a dual evaluation strategy (simulation and human) to standardize benchmarking. The practical impact lies in providing researchers with a standardized, affordable environment to test and compare end-to-end dialogue policies and reinforcement learning approaches across realistic tasks.
Abstract
This proposal introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment. In this special session, we will release human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. The final submitted systems will be evaluated both in simulated setting and by human judges.
