Table of Contents
Fetching ...

Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills

Eric Michael Smith, Mary Williamson, Kurt Shuster, Jason Weston, Y-Lan Boureau

TL;DR

The paper tackles the challenge of blending open-domain dialogue skills by evaluating several training schemes to combine single-skill models and by introducing the BlendedSkillTalk dataset. It demonstrates that multi-task training yields superior blended performance and that bias mitigation and fine-tuning on blended data further enhance results. Through automated metrics and human evaluations, the study shows balanced, credible performance across knowledge, empathy, and personal topics, with two-stage and multi-task approaches offering complementary advantages. The work provides a practical pathway toward open-domain agents capable of seamlessly integrating multiple skills and sets the stage for incorporating additional capabilities in future research.

Abstract

Being engaging, knowledgeable, and empathetic are all desirable general qualities in a conversational agent. Previous work has introduced tasks and datasets that aim to help agents to learn those qualities in isolation and gauge how well they can express them. But rather than being specialized in one single quality, a good open-domain conversational agent should be able to seamlessly blend them all into one cohesive conversational flow. In this work, we investigate several ways to combine models trained towards isolated capabilities, ranging from simple model aggregation schemes that require minimal additional training, to various forms of multi-task training that encompass several skills at all training stages. We further propose a new dataset, BlendedSkillTalk, to analyze how these capabilities would mesh together in a natural conversation, and compare the performance of different architectures and training schemes. Our experiments show that multi-tasking over several tasks that focus on particular capabilities results in better blended conversation performance compared to models trained on a single skill, and that both unified or two-stage approaches perform well if they are constructed to avoid unwanted bias in skill selection or are fine-tuned on our new task.

Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills

TL;DR

The paper tackles the challenge of blending open-domain dialogue skills by evaluating several training schemes to combine single-skill models and by introducing the BlendedSkillTalk dataset. It demonstrates that multi-task training yields superior blended performance and that bias mitigation and fine-tuning on blended data further enhance results. Through automated metrics and human evaluations, the study shows balanced, credible performance across knowledge, empathy, and personal topics, with two-stage and multi-task approaches offering complementary advantages. The work provides a practical pathway toward open-domain agents capable of seamlessly integrating multiple skills and sets the stage for incorporating additional capabilities in future research.

Abstract

Being engaging, knowledgeable, and empathetic are all desirable general qualities in a conversational agent. Previous work has introduced tasks and datasets that aim to help agents to learn those qualities in isolation and gauge how well they can express them. But rather than being specialized in one single quality, a good open-domain conversational agent should be able to seamlessly blend them all into one cohesive conversational flow. In this work, we investigate several ways to combine models trained towards isolated capabilities, ranging from simple model aggregation schemes that require minimal additional training, to various forms of multi-task training that encompass several skills at all training stages. We further propose a new dataset, BlendedSkillTalk, to analyze how these capabilities would mesh together in a natural conversation, and compare the performance of different architectures and training schemes. Our experiments show that multi-tasking over several tasks that focus on particular capabilities results in better blended conversation performance compared to models trained on a single skill, and that both unified or two-stage approaches perform well if they are constructed to avoid unwanted bias in skill selection or are fine-tuned on our new task.

Paper Structure

This paper contains 23 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Sample conversation from the BlendedSkillTalk dataset, annotated with four conversation mode types (PB: personal background; K: knowledge; S: personal situation; E: empathy). The guided (G) and unguided (U) workers are given personas and a topic. The conversation has been seeded with two utterances from a conversation sampled from WoW. When the guided worker selected one of the suggestions, it is shown in shaded grey.