Table of Contents
Fetching ...

ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach

Reem Gody, Mahmoud Goudy, Ahmed Y. Tawfik

TL;DR

ConvoGen introduces a multi-agent framework built on AutoGen to generate synthetic open-domain conversational data with persona-driven agents. By coupling an experience generator (powered by GPT-4o and few-shot learning) with iterative sampling and a group-chat instantiation process, it achieves high lexical diversity, as measured by $MTLD$, and strong grounding to input experiences via an LLM-based judge. The approach is evaluated against several human baselines across multiple configurations, demonstrating that iterative sampling increases diversity and that the generated data can be well-grounded in topic, situation, and personas. While promising for augmenting multi-party conversational datasets, the work notes potential risks from content bias or harmful outputs and emphasizes the need for safety filters and careful prompt tuning to ensure reliability in practice.

Abstract

In this paper, we present ConvoGen: an innovative framework for generating synthetic conversational data using multi-agent systems. Our method leverages few-shot learning and introduces iterative sampling from a dynamically updated few-shot hub to create diverse and realistic conversational scenarios. The generated data has numerous applications, including training and evaluating conversational AI models, and augmenting existing datasets for tasks like conversational intent classification or conversation summarization. Our experiments demonstrate the effectiveness of this method in producing high-quality diverse synthetic conversational data, highlighting its potential to enhance the development and evaluation of conversational AI systems.

ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach

TL;DR

ConvoGen introduces a multi-agent framework built on AutoGen to generate synthetic open-domain conversational data with persona-driven agents. By coupling an experience generator (powered by GPT-4o and few-shot learning) with iterative sampling and a group-chat instantiation process, it achieves high lexical diversity, as measured by , and strong grounding to input experiences via an LLM-based judge. The approach is evaluated against several human baselines across multiple configurations, demonstrating that iterative sampling increases diversity and that the generated data can be well-grounded in topic, situation, and personas. While promising for augmenting multi-party conversational datasets, the work notes potential risks from content bias or harmful outputs and emphasizes the need for safety filters and careful prompt tuning to ensure reliability in practice.

Abstract

In this paper, we present ConvoGen: an innovative framework for generating synthetic conversational data using multi-agent systems. Our method leverages few-shot learning and introduces iterative sampling from a dynamically updated few-shot hub to create diverse and realistic conversational scenarios. The generated data has numerous applications, including training and evaluating conversational AI models, and augmenting existing datasets for tasks like conversational intent classification or conversation summarization. Our experiments demonstrate the effectiveness of this method in producing high-quality diverse synthetic conversational data, highlighting its potential to enhance the development and evaluation of conversational AI systems.

Paper Structure

This paper contains 15 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The procedure for generating conversations using ConvoGen. First, the experience generator generates the personas, their relations, a situation, a topic and a conversation starter. Next, a group of agents initialized using these personas engage in a conversation using the generated conversation starter.
  • Figure 1: Method 1: Prompt for generating experiences using few-shot learning
  • Figure 2: Method 2: Prompt for generating experiences using few-shot learning and input personas sampled from the persona hub
  • Figure 3: The message that is sent by the user proxy to the chat manager to initiate the group conversation, and an example of a generated group chat conversation.
  • Figure 3: Guidelines for each agent to drive the conversation