Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation
Razan Baltaji, Babak Hemmatian, Lav R. Varshney
TL;DR
This work probes how multi-agent LLMs maintain assigned national personas and contribute to culturally diverse group decisions. Using a three-stage framework (onboarding, debate/collaboration, reflection) with entropy-based diversity controls, the authors quantify conformity, impersonation, and confabulation across debate and collaboration modes. They find that diversity encourages broader perspectives but is undermined by conformity pressures and occasional persona drift, with debate instructions sometimes increasing inconstancy. The results emphasize the need to diagnose and mitigate sources of persona instability to unlock the full potential of multi-agent simulations for scientific, diplomatic, and policy-relevant tasks.
Abstract
Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications. They can also be used to introduce a diverse group discussion step in chatbot pipelines, enhancing the cultural sensitivity of the chatbot's responses. These applications, however, are predicated on the ability of AI agents to reliably adopt assigned personas and mimic human interactions. To see whether LLM agents satisfy these requirements, we examine AI agent ensembles engaged in cross-national collaboration and debate by analyzing their private responses and chat transcripts. Our findings suggest that multi-agent discussions can support collective AI decisions that more often reflect diverse perspectives, yet this effect is tempered by the agents' susceptibility to conformity due to perceived peer pressure and occasional challenges in maintaining consistent personas and opinions. Instructions that encourage debate in support of one's opinions rather than collaboration increase the rate of inconstancy. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs or more realistic simulations of group decision-making may remain untapped.
