Table of Contents
Fetching ...

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

Baolin Peng, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, Jianfeng Gao

TL;DR

GODEL presents a large, open-source dialog model trained in three phases to ground responses in external information. By introducing a unified extrinsic utility evaluation alongside traditional intrinsic measures, the work shows improved few-shot performance across knowledge-grounded generation, task-oriented dialog, and conversational QA. Grounded pre-training, along with transformer encoder-decoder architecture and multiple model scales, yields strong results and practical benefits for rapid adaptation, with comprehensive human and automated validation. The authors release code, data processing scripts, and models to facilitate reuse and further research. Overall, the work argues for prioritizing utility-driven evaluation in open-domain dialog systems and demonstrates substantial gains from grounding external knowledge during pre-training.

Abstract

We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-trained language model for dialog. In contrast with earlier models such as DialoGPT, GODEL leverages a new phase of grounded pre-training designed to better support adapting GODEL to a wide range of downstream dialog tasks that require information external to the current conversation (e.g., a database or document) to produce good responses. Experiments against an array of benchmarks that encompass task-oriented dialog, conversational QA, and grounded open-domain dialog show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups, in terms of both human and automatic evaluation. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses (extrinsic evaluation) in addition to their communicative features (intrinsic evaluation). We show that extrinsic evaluation offers improved inter-annotator agreement and correlation with automated metrics. Code and data processing scripts are publicly available.

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

TL;DR

GODEL presents a large, open-source dialog model trained in three phases to ground responses in external information. By introducing a unified extrinsic utility evaluation alongside traditional intrinsic measures, the work shows improved few-shot performance across knowledge-grounded generation, task-oriented dialog, and conversational QA. Grounded pre-training, along with transformer encoder-decoder architecture and multiple model scales, yields strong results and practical benefits for rapid adaptation, with comprehensive human and automated validation. The authors release code, data processing scripts, and models to facilitate reuse and further research. Overall, the work argues for prioritizing utility-driven evaluation in open-domain dialog systems and demonstrates substantial gains from grounding external knowledge during pre-training.

Abstract

We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-trained language model for dialog. In contrast with earlier models such as DialoGPT, GODEL leverages a new phase of grounded pre-training designed to better support adapting GODEL to a wide range of downstream dialog tasks that require information external to the current conversation (e.g., a database or document) to produce good responses. Experiments against an array of benchmarks that encompass task-oriented dialog, conversational QA, and grounded open-domain dialog show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups, in terms of both human and automatic evaluation. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses (extrinsic evaluation) in addition to their communicative features (intrinsic evaluation). We show that extrinsic evaluation offers improved inter-annotator agreement and correlation with automated metrics. Code and data processing scripts are publicly available.
Paper Structure (15 sections, 1 equation, 3 figures, 16 tables)

This paper contains 15 sections, 1 equation, 3 figures, 16 tables.

Figures (3)

  • Figure 1: Godel pre-training and fine-tuning with a Transformer-based encoder-decoder model, taking dialog context and environment (world state or external knowledge) as an input represented as a string.
  • Figure 2: Sample training instance, with conversation history in red, grounding in blue, and response in green.
  • Figure 3: Human evaluation task design.