Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel

Brendan King; Jeffrey Flanigan

Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel

Brendan King, Jeffrey Flanigan

TL;DR

An innovative approach using expectation-maximization (EM) that infers turn-level annotations as latent variables using a noisy channel model to build an end-to-end dialogue agent that more than doubles the dialogue success rate of a strong GPT-3.5 baseline.

Abstract

Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize that unlabeled data and a schema definition are sufficient for building a working task-oriented dialogue system, completely unsupervised. We consider a novel unsupervised setting of only (1) a well-defined API schema (2) a set of unlabeled dialogues between a user and agent. We propose an innovative approach using expectation-maximization (EM) that infers turn-level annotations as latent variables using a noisy channel model to build an end-to-end dialogue agent. Evaluating our approach on the MultiWOZ benchmark, our method more than doubles the dialogue success rate of a strong GPT-3.5 baseline.

Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel

TL;DR

Abstract

Paper Structure (40 sections, 3 equations, 8 figures, 8 tables)

This paper contains 40 sections, 3 equations, 8 figures, 8 tables.

Introduction
Preliminaries
Method Overview
Inferring Latents via Noisy Channel
Inferring API Calls and Dialogue State
Inferring System Acts
Noisy Channel LLM Prompting
Retrieval-Augmented In-context Learning
Refining the Labels with Hard-EM
End-to-End System
DST
Policy
Response Generation
End-to-end Training
Experiments
...and 25 more sections

Figures (8)

Figure 1: An overview of our unsupervised dialogue problem. We assume 1) unlabeled goal-oriented dialogues between a user and agent and 2) a well-defined schema $\mathcal{S}$ with APIs suitable for fulfilling goals. We infer the unseen interactions between the agent and API, and use this to produce an end-to-end dialogue agent.
Figure 2: An overview of the latent variables annotated in our unsupervised labeling process which are used to train the dialogue model. Our DST Module (\ref{['sec:methods-dst']}) infers the API call(s) with arguments at each turn, from which we can derive the dialogue state change. Our DAT or Act Tagging module (\ref{['sec:methods-tagging']}) predicts the dialogue acts communicated in the observed system response, which can be used to infer de-lexicalized responses for training a response generator.
Figure 3: Instances from our 'direct' and 'noisy channel' prompts for DST. Best viewed in color. After sampling a DST completion from the 'direct' prompt, we score it by the likelihood of the input user utterance conditioned on it in the 'noisy channel' prompt.
Figure 4: Combined score ($0.5(\text{Inform} + \text{Success}) + BLEU$) vs. the number of steps of expectation-maximization in our Noisy Channel method vs. a Greedy Ablation. '0' is zero-shot inference
Figure 5: log(Frequency) vs. Rank of dialogue acts used by each model over a 200 dialogue sample of the validation set. 'Natural' refers to human annotations. We find our Noisy Channel approach uses a higher number of unique dialogue acts than the Greedy approach and better matches the characteristics of the distribution used by human annotators
...and 3 more figures

Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel

TL;DR

Abstract

Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel

Authors

TL;DR

Abstract

Table of Contents

Figures (8)