Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics
Rasika Muralidharan, Haewoon Kwak, Jisun An
TL;DR
The paper investigates whether human team science insights—focusing on team structure, diversity, and interaction dynamics—can improve multi-agent systems driven by large language models. It introduces a framework to compare flat versus hierarchical structures, assigns demographic personas to agents, and uses four reasoning-focused tasks (CommonsenseQA, StrategyQA, Social IQa, and Implicit Hate) plus elicitation and LLM-based judgments to assess performance and coordination. Key findings show that flat teams generally outperform hierarchical ones on reasoning tasks, while demographic diversity yields nuanced, task- and structure-dependent effects; pre-task overconfidence contrasts with post-task integration challenges, especially in hierarchical setups. The work provides design implications for interpretable, collaborative AI teams and suggests future work on adaptive structures, diverse and cross-cultural evaluation, and deeper interpretability to understand how teams reason, disagree, and converge over time.
Abstract
Multi-Agent Systems (MAS) with Large Language Model (LLM)-powered agents are gaining attention, yet fewer studies explore their team dynamics. Inspired by human team science, we propose a multi-agent framework to examine core aspects of team science: structure, diversity, and interaction dynamics. We evaluate team performance across four tasks: CommonsenseQA, StrategyQA, Social IQa, and Latent Implicit Hate, spanning commonsense and social reasoning. Our results show that flat teams tend to perform better than hierarchical ones, while diversity has a nuanced impact. Interviews suggest agents are overconfident about their team performance, yet post-task reflections reveal both appreciation for collaboration and challenges in integration, including limited conversational coordination.
