Prior Lessons of Incremental Dialogue and Robot Action Management for the Age of Language Models

Casey Kennington; Pierre Lison; David Schlangen

Prior Lessons of Incremental Dialogue and Robot Action Management for the Age of Language Models

Casey Kennington, Pierre Lison, David Schlangen

TL;DR

It is found that there is very little research on incremental dialogue management, offer some requirements for practical incremental dialogue management, and implications of incremental dialogue for embodied, robotic platforms in the age of large language models are found.

Abstract

Efforts towards endowing robots with the ability to speak have benefited from recent advancements in natural language processing, in particular large language models. However, current language models are not fully incremental, as their processing is inherently monotonic and thus lack the ability to revise their interpretations or output in light of newer observations. This monotonicity has important implications for the development of dialogue systems for human--robot interaction. In this paper, we review the literature on interactive systems that operate incrementally (i.e., at the word level or below it). We motivate the need for incremental systems, survey incremental modeling of important aspects of dialogue like speech recognition and language generation. Primary focus is on the part of the system that makes decisions, known as the dialogue manager. We find that there is very little research on incremental dialogue management, offer some requirements for practical incremental dialogue management, and the implications of incremental dialogue for embodied, robotic platforms in the age of large language models.

Prior Lessons of Incremental Dialogue and Robot Action Management for the Age of Language Models

TL;DR

Abstract

Paper Structure (25 sections, 4 figures)

This paper contains 25 sections, 4 figures.

Introduction
Background: Incremental Spoken Dialogue Systems
Spoken Dialogue Systems: Overview
Frameworks & Architectures
The Incremental Unit Framework
Restart vs. Update Incremental Models
Restart Incremental
Update Incremental
Common Modules in Incremental, Interactive Systems
Automatic Speech Recognition
Natural Language Understanding
Natural Language Generation and Speech Synthesis
Incremental Systems & Evaluation
Incremental systems improve over non-incremental counterparts
Challenges of incremental evaluation
...and 10 more sections

Figures (4)

Figure 1: Traditional architecture for spoken dialogue systems composed of Automatic Speech Recognition (ASR), Natural Langauge Understanding (NLU), Dialogue Management (DM), Natural Language Generation (NLG), and Text-to-Speech Synthesis (TTS).
Figure 2: From Kennington2021-fm, an example of Pointer, Word, POS, and SEM iu annotations for a sample from the Localized Narrative dataset. Solid lines denote slls, dashed denote grins, and the dotted lines denote an alignment between two modalities. Image taken from https://google.github.io/localized-narratives.
Figure 3: From Pincus2017-oh, an example of human-human and game intelligent update dialogues with barge-in.
Figure 4: From Zilka2015-ul, a schematic of a LSTM-based dialogue state tracker.

Prior Lessons of Incremental Dialogue and Robot Action Management for the Age of Language Models

TL;DR

Abstract

Prior Lessons of Incremental Dialogue and Robot Action Management for the Age of Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)