Table of Contents
Fetching ...

PARADISE: A Framework for Evaluating Spoken Dialogue Agents

Marilyn A. Walker, Diane J. Litman, Candace A. Kamm, Alicia Abella

TL;DR

PARADISE tackles the problem of evaluating spoken dialogue agents across tasks by separating what needs to be achieved from how it is achieved in dialogue. It introduces a decision-theoretic performance function that combines a task-based success measure, $κ$, with multiple dialogue-cost metrics $c_i$, using user satisfaction to learn the weights via linear regression; costs and success are normalized with $N(x) = (x - \overline{x})/\sigma_x$. The framework relies on an Attribute-Value Matrix (AVM) task representation to allow task-general evaluation and supports calculating performance for subdialogues as well as whole dialogues. The results illustrate how different dialogue strategies can be evaluated and compared, and emphasize careful generalization and iterative model refinement for predictive power.

Abstract

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

PARADISE: A Framework for Evaluating Spoken Dialogue Agents

TL;DR

PARADISE tackles the problem of evaluating spoken dialogue agents across tasks by separating what needs to be achieved from how it is achieved in dialogue. It introduces a decision-theoretic performance function that combines a task-based success measure, , with multiple dialogue-cost metrics , using user satisfaction to learn the weights via linear regression; costs and success are normalized with . The framework relies on an Attribute-Value Matrix (AVM) task representation to allow task-general evaluation and supports calculating performance for subdialogues as well as whole dialogues. The results illustrate how different dialogue strategies can be evaluated and compared, and emphasize careful generalization and iterative model refinement for predictive power.

Abstract

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

Paper Structure

This paper contains 11 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: PARADISE's structure of objectives for spoken dialogue performance
  • Figure 2: Agent A dialogue interaction (Danieli and Gerbino, 1995)
  • Figure 3: Agent B dialogue interaction (Danieli and Gerbino, 1995)
  • Figure 4: Task-defined discourse structure of Agent A dialogue interaction
  • Figure 5: Hypothetical Agent C dialogue interaction
  • ...and 1 more figures