Table of Contents
Fetching ...

LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues

Joe Stacey, Jianpeng Cheng, John Torr, Tristan Guigue, Joris Driesen, Alexandru Coca, Mark Gaynor, Anders Johannsen

TL;DR

LUCID introduces a scalable, automated pipeline for generating high-quality task-oriented dialogue data using a modular sequence of LLM calls. By separating intent generation, conversation planning, turn-by-turn generation, and validation, it achieves diverse intents (100) across many domains (13) with rich slot structures (501) and a wide range of conversational phenomena. A rigorous validation protocol and a mock back-end ensure labeling reliability, yielding a seed dataset of 4,277 dialogues with low labeling error rates and facilitating both in-distribution and out-of-distribution evaluation. The work demonstrates competitive baseline performance on seen intents and meaningful generalization to unseen intents, and provides open-source tooling to enable large-scale, automated data generation for new domains and targets in dialogue systems.

Abstract

Spurred by recent advances in Large Language Models (LLMs), virtual assistants are poised to take a leap forward in terms of their dialogue capabilities. Yet a major bottleneck to achieving genuinely transformative task-oriented dialogue capabilities remains the scarcity of high quality data. Existing datasets, while impressive in scale, have limited domain coverage and contain few genuinely challenging conversational phenomena; those which are present are typically unlabelled, making it difficult to assess the strengths and weaknesses of models without time-consuming and costly human evaluation. Moreover, creating high quality dialogue data has until now required considerable human input, limiting both the scale of these datasets and the ability to rapidly bootstrap data for a new target domain. We aim to overcome these issues with LUCID, a modularised and highly automated LLM-driven data generation system that produces realistic, diverse and challenging dialogues. We use LUCID to generate a seed dataset of 4,277 conversations across 100 intents to demonstrate its capabilities, with a human review finding consistently high quality labels in the generated data.

LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues

TL;DR

LUCID introduces a scalable, automated pipeline for generating high-quality task-oriented dialogue data using a modular sequence of LLM calls. By separating intent generation, conversation planning, turn-by-turn generation, and validation, it achieves diverse intents (100) across many domains (13) with rich slot structures (501) and a wide range of conversational phenomena. A rigorous validation protocol and a mock back-end ensure labeling reliability, yielding a seed dataset of 4,277 dialogues with low labeling error rates and facilitating both in-distribution and out-of-distribution evaluation. The work demonstrates competitive baseline performance on seen intents and meaningful generalization to unseen intents, and provides open-source tooling to enable large-scale, automated data generation for new domains and targets in dialogue systems.

Abstract

Spurred by recent advances in Large Language Models (LLMs), virtual assistants are poised to take a leap forward in terms of their dialogue capabilities. Yet a major bottleneck to achieving genuinely transformative task-oriented dialogue capabilities remains the scarcity of high quality data. Existing datasets, while impressive in scale, have limited domain coverage and contain few genuinely challenging conversational phenomena; those which are present are typically unlabelled, making it difficult to assess the strengths and weaknesses of models without time-consuming and costly human evaluation. Moreover, creating high quality dialogue data has until now required considerable human input, limiting both the scale of these datasets and the ability to rapidly bootstrap data for a new target domain. We aim to overcome these issues with LUCID, a modularised and highly automated LLM-driven data generation system that produces realistic, diverse and challenging dialogues. We use LUCID to generate a seed dataset of 4,277 conversations across 100 intents to demonstrate its capabilities, with a human review finding consistently high quality labels in the generated data.
Paper Structure (29 sections, 7 figures, 10 tables)

This paper contains 29 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: An extract of a LUCID conversation containing a challenging phenomenon. In this case, the second user response is most likely to be from an overheard conversation rather than providing the desired slot value.
  • Figure 2: The stages in the LUCID data generation, generating intents (stages 1-2), planning conversations (stages 3-8), generating the conversations (stages 9-12) and validating the system predictions (stages 13-14).
  • Figure 3: A (simplified) example labelled conversation. Each dialogue contains user, system, signal and response turns.
  • Figure 4: Examples for eight of the nine challenging conversational phenomena included in the LUCID dataset. We also included 'cancellation' examples which are similar to 'delay confirmation', resulting in the system not confirming a given intent.
  • Figure 5: An example conversation from LUCID (Example #1). As described in \ref{['sec:examples']}, we show the first three LUCID conversations to provide an unbiased sample of our generated data.
  • ...and 2 more figures