Table of Contents
Fetching ...

A Simple Language Model for Task-Oriented Dialogue

Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher

TL;DR

Task-oriented dialogue research often relies on modular pipelines that risk error propagation. This paper introduces SimpleTOD, a unified end-to-end approach that casts all TOD sub-tasks as a single sequence prediction task using a causal Transformer decoder, enabling direct transfer learning from open-domain pretraining such as GPT-2. Empirical results on MultiWOZ show state-of-the-art joint DST accuracy and strong end-to-end metrics for inform and success without task-specific supervision. Ablation studies highlight the importance of special tokens, pretraining, and decoding choices, while analyses on noisy annotations and long-context dialogues demonstrate robustness and scalability. The work suggests that simple, unified sequence modeling can match or exceed complex modular systems, with practical implications for easier deployment of task-oriented dialogue agents.

Abstract

Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art in joint goal accuracy for dialogue state tracking, and our analysis reveals robustness to noisy annotations in this setting. SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.

A Simple Language Model for Task-Oriented Dialogue

TL;DR

Task-oriented dialogue research often relies on modular pipelines that risk error propagation. This paper introduces SimpleTOD, a unified end-to-end approach that casts all TOD sub-tasks as a single sequence prediction task using a causal Transformer decoder, enabling direct transfer learning from open-domain pretraining such as GPT-2. Empirical results on MultiWOZ show state-of-the-art joint DST accuracy and strong end-to-end metrics for inform and success without task-specific supervision. Ablation studies highlight the importance of special tokens, pretraining, and decoding choices, while analyses on noisy annotations and long-context dialogues demonstrate robustness and scalability. The work suggests that simple, unified sequence modeling can match or exceed complex modular systems, with practical implications for easier deployment of task-oriented dialogue agents.

Abstract

Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art in joint goal accuracy for dialogue state tracking, and our analysis reveals robustness to noisy annotations in this setting. SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.

Paper Structure

This paper contains 31 sections, 9 equations, 2 figures, 16 tables.

Figures (2)

  • Figure 1: SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model to generate all outputs given the dialogue context and retrieved database search results. The delexicalized response can then be lexicalized into a human-readable response by using information from the belief state and DB search results.
  • Figure 2: SimpleTOD is a simple approach to task-oriented dialogue that approaches all of task-oriented dialogue as a single sequence generation problem, querying a database for necessary information.