Table of Contents
Fetching ...

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Nalin Kumar, Ondřej Dušek

TL;DR

The paper tackles the challenge of linguistic entrainment in end-to-end task-oriented dialogue systems. It proposes three methods—instance weighting, user token likelihood loss, and lexical keyword conditioning—to induce lexical alignment with users in a GPT-2-based DS, evaluated on MultiWOZ 2.1 against baselines and GPT-4. The results show improved lexical and syntactic entrainment with minimal sacrifice to task success, and human evaluation favors certain variants for fluency and naturalness. This work demonstrates that structured entrainment can be integrated into end-to-end dialogue systems, offering practical gains in naturalness and efficiency.

Abstract

Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While entrainment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue entrainment in a GPT-2-based end-to-end task-oriented dialogue system through the utilization of shared vocabulary. We experiment with training instance weighting, entrainment-specific loss, and additional conditioning to generate responses that align with the user. We demonstrate that all three approaches produce significantly better entrainment than the base, non-entrainment-optimized model, as confirmed by both automated and manual evaluation metrics.

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

TL;DR

The paper tackles the challenge of linguistic entrainment in end-to-end task-oriented dialogue systems. It proposes three methods—instance weighting, user token likelihood loss, and lexical keyword conditioning—to induce lexical alignment with users in a GPT-2-based DS, evaluated on MultiWOZ 2.1 against baselines and GPT-4. The results show improved lexical and syntactic entrainment with minimal sacrifice to task success, and human evaluation favors certain variants for fluency and naturalness. This work demonstrates that structured entrainment can be integrated into end-to-end dialogue systems, offering practical gains in naturalness and efficiency.

Abstract

Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While entrainment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue entrainment in a GPT-2-based end-to-end task-oriented dialogue system through the utilization of shared vocabulary. We experiment with training instance weighting, entrainment-specific loss, and additional conditioning to generate responses that align with the user. We demonstrate that all three approaches produce significantly better entrainment than the base, non-entrainment-optimized model, as confirmed by both automated and manual evaluation metrics.
Paper Structure (24 sections, 4 equations, 3 figures, 3 tables)

This paper contains 24 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Examples of linguistic entrainment in task-oriented dialogues from the MultiWOZ 2.1 dataset (dialogue IDs shown in brackets). While the responses in the dataset reuse the same words, a base model (Base-CE) produces a non-matching expression, hampering dialogue fluency. Our approach (LK-CE(0)) keeps the appropriate reuse. See Section \ref{['experiments_and_results']} for model details.
  • Figure 2: Our GPT-4 prompt template (top) and example outputs (bottom). The CONTEXT, DS, DB_RESULTS and SLOTNAMES variables are filled in according to the current dialogue context, the (gold-standard) dialogue state and database results, and the slot names for the current domain. Examples 1 and 2 are quite fluent and syntactically aligned to the user. In Example 3, the model struggles with using slot placeholders and their correct values. This issue was quite frequent in our limited observation, even after trying several different prompts. Example 4 shows a self-contradicting response from the model.
  • Figure 3: In the first example, entrainment methods effectively produce more natural and less automated-like outputs, even when the ground truth response itself looks less natural. In the second example, the model outputs employing entrainment methods adeptly incorporate the phrases Can you, assist me, and with that, whereas the reranking method, D$\&$J16, yields a lesser natural output. In the third example, the phrase in the centre is consistently present in almost every output, but D$\&$J16 and Base-CE struggle to sustain the conversation. Conversely, the other methods successfully continue the conversation with improved entrainment.