Table of Contents
Fetching ...

Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues

KuanChao Chu, Yi-Pei Chen, Hideki Nakayama

TL;DR

The paper tackles pervasive quality degradation in multi-agent dialogues powered by LLMs, identifying repetition, inconsistency, and hallucination as core, time-propagating issues. It introduces the Screening, Diagnosis, and Regeneration (SDR) framework, which uses evidence gathering from past dialogues, a Natural Language Inference-Graph (NLI-G) for inconsistency, and iterative regeneration to produce more diverse, consistent, and factual conversations. Through experiments on the OneDayLife dataset, SDR demonstrates superior corpus-level diversity, factualness, consistency, and fluency, while reducing repetitive keyword usage and preserving dialogue integrity over time. The work provides a scalable, on-the-fly correction approach that sets a new standard for dialogue quality in open-domain multi-agent simulations and informs future research on robust, long-horizon agent interactions.

Abstract

This paper investigates the quality of multi-agent dialogues in simulations powered by Large Language Models (LLMs). Analyzing dialogues and memory over multiple sessions revealed significant issues such as repetition, inconsistency, and hallucination, exacerbated by the propagation of erroneous information. To combat these challenges, we propose a novel Screening, Diagnosis, and Regeneration (SDR) framework that detects and corrects utterance errors through a comprehensive process involving immediate issue identification, evidence gathering from past dialogues, and LLM analysis for utterance revision. By incorporating our SDR framework to Generative Agents (Park et al., 2023), we enhance the diversity, consistency, and factualness of the generated dialogues. This work presents a pioneering approach to enhancing dialogue quality in multi-agent simulations, establishing a new standard for future research in the field.

Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues

TL;DR

The paper tackles pervasive quality degradation in multi-agent dialogues powered by LLMs, identifying repetition, inconsistency, and hallucination as core, time-propagating issues. It introduces the Screening, Diagnosis, and Regeneration (SDR) framework, which uses evidence gathering from past dialogues, a Natural Language Inference-Graph (NLI-G) for inconsistency, and iterative regeneration to produce more diverse, consistent, and factual conversations. Through experiments on the OneDayLife dataset, SDR demonstrates superior corpus-level diversity, factualness, consistency, and fluency, while reducing repetitive keyword usage and preserving dialogue integrity over time. The work provides a scalable, on-the-fly correction approach that sets a new standard for dialogue quality in open-domain multi-agent simulations and informs future research on robust, long-horizon agent interactions.

Abstract

This paper investigates the quality of multi-agent dialogues in simulations powered by Large Language Models (LLMs). Analyzing dialogues and memory over multiple sessions revealed significant issues such as repetition, inconsistency, and hallucination, exacerbated by the propagation of erroneous information. To combat these challenges, we propose a novel Screening, Diagnosis, and Regeneration (SDR) framework that detects and corrects utterance errors through a comprehensive process involving immediate issue identification, evidence gathering from past dialogues, and LLM analysis for utterance revision. By incorporating our SDR framework to Generative Agents (Park et al., 2023), we enhance the diversity, consistency, and factualness of the generated dialogues. This work presents a pioneering approach to enhancing dialogue quality in multi-agent simulations, establishing a new standard for future research in the field.
Paper Structure (52 sections, 1 equation, 7 figures, 6 tables, 1 algorithm)

This paper contains 52 sections, 1 equation, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Example dialogues from OneDayLife showing problems of repetition, inconsistency, and hallucination. Each agent name is colored, and the bold-colored phrases indicate the mentioned attribute's owner or the original speaker.
  • Figure 2: The spread of the keyword "collaboration" in OneDayLife. Left: The number of dialogues and the ratio that includes the keyword in each time span. Middle: Number of dialogues with the keyword in the first 20% of spreading time. Each line represents a dialogue between two agents and the line color indicates the identity of the agent who firstly mentions the keyword. Right: Number of dialogues with the keyword in all time.
  • Figure 3: Overview of the proposed Screening, Diagnosis, Re-generation (SDR) framework, an instant error correction method for multi-agent simulated dialogues. The modules in green are run by the LLM.
  • Figure 4: Examples of hallucination screening. In Case 1, although Abigail is mentioned, it pertains only to Rajiv's personal plan, not to a fact about Abigail. In Case 2, Ryan objectively describes a past event involving Carlos. However, this event could have been entirely fabricated by Ryan, representing a potential harmful hallucination.
  • Figure 5: Diversity Analysis
  • ...and 2 more figures