Table of Contents
Fetching ...

STAR: A Schema-Guided Dialog Dataset for Transfer Learning

Johannes E. M. Mosig, Shikib Mehri, Thomas Kober

TL;DR

STAR introduces a schema-guided, transfer-focused dialog dataset designed to enable robust zero-shot generalization across tasks and domains in task-oriented dialog. It pairs explicit, graph-based task schemas with a scalable crowd-sourcing pipeline and schema-conditioned models for next-action prediction and response generation. The authors show that schema guidance can improve multi-task transfer and generation quality, while highlighting challenges in seen-task action prediction and zero-shot gaps. Overall, STAR provides a valuable benchmark and methodological framework for evaluating and advancing schema-based transfer learning in conversational AI.

Abstract

We present STAR, a schema-guided task-oriented dialog dataset consisting of 127,833 utterances and knowledge base queries across 5,820 task-oriented dialogs in 13 domains that is especially designed to facilitate task and domain transfer learning in task-oriented dialog. Furthermore, we propose a scalable crowd-sourcing paradigm to collect arbitrarily large datasets of the same quality as STAR. Moreover, we introduce novel schema-guided dialog models that use an explicit description of the task(s) to generalize from known to unknown tasks. We demonstrate the effectiveness of these models, particularly for zero-shot generalization across tasks and domains.

STAR: A Schema-Guided Dialog Dataset for Transfer Learning

TL;DR

STAR introduces a schema-guided, transfer-focused dialog dataset designed to enable robust zero-shot generalization across tasks and domains in task-oriented dialog. It pairs explicit, graph-based task schemas with a scalable crowd-sourcing pipeline and schema-conditioned models for next-action prediction and response generation. The authors show that schema guidance can improve multi-task transfer and generation quality, while highlighting challenges in seen-task action prediction and zero-shot gaps. Overall, STAR provides a valuable benchmark and methodological framework for evaluating and advancing schema-based transfer learning in conversational AI.

Abstract

We present STAR, a schema-guided task-oriented dialog dataset consisting of 127,833 utterances and knowledge base queries across 5,820 task-oriented dialogs in 13 domains that is especially designed to facilitate task and domain transfer learning in task-oriented dialog. Furthermore, we propose a scalable crowd-sourcing paradigm to collect arbitrarily large datasets of the same quality as STAR. Moreover, we introduce novel schema-guided dialog models that use an explicit description of the task(s) to generalize from known to unknown tasks. We demonstrate the effectiveness of these models, particularly for zero-shot generalization across tasks and domains.

Paper Structure

This paper contains 29 sections, 3 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Flow chart representation of the schema-graph for the task. The corresponding schema file is shown in Appendix \ref{['apx:format']}.
  • Figure 2: Wizard's graphical user interface with annotation. From left to right, top to bottom. Orange: Tabs to switch tasks. Purple: Tabs to switch between schema flow chart and knowledge base. Green: Knowledge base interface (different for each task). Red: Knowledge base item. Magenta: Response query field. Blue: Suggested responses.
  • Figure 3: Accuracy of the response selector vs. number of turns that it takes into account.
  • Figure 4: Schema graph representation corresponding to the flow chart visualized in Figure \ref{['fig:doc_schema']}.
  • Figure 5: Schema-guided next action prediction model as described in §\ref{['sec:models:action']} in Equations 1 - 4.
  • ...and 3 more figures