Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Dong Won Lee; Hae Won Park; Yoon Kim; Cynthia Breazeal; Louis-Philippe Morency

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

TL;DR

This work describes an approach for aligning an LLM-based dialogue agent based on global rewards, while also taking into account naturally-occurring multimodal signals, and finds that it shows consistent improvements across various conversational metrics compared to baseline methods.

Abstract

We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the standard RHLF pipeline improve an LLM-based dialog agent. We run quantitative and qualitative human studies to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

TL;DR

Abstract

Paper Structure (40 sections, 10 equations, 27 figures, 4 tables)

This paper contains 40 sections, 10 equations, 27 figures, 4 tables.

Introduction
Related Works
Reward Design
Temporal Credit Assignment
Aligning Language Models To Human Preferences
Utilizing Implicit Signals for Dialogue Agents
Background
Reinforcement Learning with Human Feedback (RLHF).
Methods: GELI
GE: Decomposing One Global Explicit Annotation
LI: Crossmodal Reward Shaping with Local Implicit Multimodal Signals
Experiments
Dataset
Baseline Models
Evaluation:
...and 25 more sections

Figures (27)

Figure 1: Overview of our proposed method: GELI. Left: The reward function training involves decomposing a single global explicit (GE) feedback, with the guidance of multimodal local implicit (LI) feedback, such as visual facial affect. Right: We utilize the decomposed reward function to update the language model, where the language model generates utterances and the reward function assigns a score to be optimized via PPO schulman2017proximal.
Figure 2: Example of GELI reward score predictions for an unseen conversation from the dataset. Top left: Reward scores unrolled over an unseen conversation, where the mean is subtracted. We examine a random sampled snippet, where we find that our decomposed reward function assigns higher values to meaningful utterances.
Figure 3: Generated utterances with colors indicating aligned conversational topics. We display our proposed approach GELI alongside human groundtruth, the best performing global explicit decomposition methods (RRD), local implicit rewards (visual affect and language sentiment). We find that GELI adapts the language model to generate more coherent, personable and empathetic conversational response.
Figure 4: Candor Demographics
Figure 5: Mturk experiment for human evaluation fo generated samples
...and 22 more figures

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

TL;DR

Abstract

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (27)