Table of Contents
Fetching ...

Enhancing Task-Oriented Dialogues with Chitchat: a Comparative Study Based on Lexical Diversity and Divergence

Armand Stricker, Patrick Paroubek

TL;DR

This paper tackles the problem of repetitive responses in task-oriented dialogues (TODs) by evaluating three chitchat augmentation strategies—Accentor, KETOD, and FusedChat—against a BST chitchat reference. It employs entropy-based measures (Shannon entropy and conditional entropy) and Jensen-Shannon divergence, at both corpus and token levels, to quantify lexical diversity and lexical divergence, including a top-20 divergent-token analysis. Results show FusedChat yields the largest diversity gains, while Accentor often provides limited diversity improvements despite higher engagement, and KETOD contributes moderate gains with notable grounding in external knowledge. The study suggests that integrating task and chitchat through more situated grounding (emotions, personas, external knowledge) can produce more natural and varied TODs, guiding future dataset construction and model architectures toward richer human-like dialogue.

Abstract

As a recent development, task-oriented dialogues (TODs) have been enriched with chitchat in an effort to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making the mitigation of repetitive and predictable responses a significant challenge. This paper presents a comparative analysis of three chitchat enhancements, aiming to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges.

Enhancing Task-Oriented Dialogues with Chitchat: a Comparative Study Based on Lexical Diversity and Divergence

TL;DR

This paper tackles the problem of repetitive responses in task-oriented dialogues (TODs) by evaluating three chitchat augmentation strategies—Accentor, KETOD, and FusedChat—against a BST chitchat reference. It employs entropy-based measures (Shannon entropy and conditional entropy) and Jensen-Shannon divergence, at both corpus and token levels, to quantify lexical diversity and lexical divergence, including a top-20 divergent-token analysis. Results show FusedChat yields the largest diversity gains, while Accentor often provides limited diversity improvements despite higher engagement, and KETOD contributes moderate gains with notable grounding in external knowledge. The study suggests that integrating task and chitchat through more situated grounding (emotions, personas, external knowledge) can produce more natural and varied TODs, guiding future dataset construction and model architectures toward richer human-like dialogue.

Abstract

As a recent development, task-oriented dialogues (TODs) have been enriched with chitchat in an effort to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making the mitigation of repetitive and predictable responses a significant challenge. This paper presents a comparative analysis of three chitchat enhancements, aiming to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges.
Paper Structure (11 sections, 4 figures, 1 table)

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Dialogue examples from each dataset.
  • Figure 2: The bar chart presents the entropy for original and augmented responses for our three datasets, and BST. Considering that entropy is a logarithmic measure, the plot below the bar chart shows the uncertainty ratio between the original and augmented responses. For example, when considering trigrams, augmented responses in FusedChat contain approx. 1.89x more uncertainty than their purely task-oriented counterparts.
  • Figure 3: The bar chart presents the conditional entropy for original and enhanced responses, and BST. The plot below the bar chart should be read as in Figure \ref{['fig:entropy']}.
  • Figure 4: Token-level divergences per dataset. The 20 most divergent tokens are shown in each case and are ranked according to their JSD scores. Bar directions are in accordance with each back-to-back chart title.