Table of Contents
Fetching ...

Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics

Rasika Muralidharan, Haewoon Kwak, Jisun An

TL;DR

The paper investigates whether human team science insights—focusing on team structure, diversity, and interaction dynamics—can improve multi-agent systems driven by large language models. It introduces a framework to compare flat versus hierarchical structures, assigns demographic personas to agents, and uses four reasoning-focused tasks (CommonsenseQA, StrategyQA, Social IQa, and Implicit Hate) plus elicitation and LLM-based judgments to assess performance and coordination. Key findings show that flat teams generally outperform hierarchical ones on reasoning tasks, while demographic diversity yields nuanced, task- and structure-dependent effects; pre-task overconfidence contrasts with post-task integration challenges, especially in hierarchical setups. The work provides design implications for interpretable, collaborative AI teams and suggests future work on adaptive structures, diverse and cross-cultural evaluation, and deeper interpretability to understand how teams reason, disagree, and converge over time.

Abstract

Multi-Agent Systems (MAS) with Large Language Model (LLM)-powered agents are gaining attention, yet fewer studies explore their team dynamics. Inspired by human team science, we propose a multi-agent framework to examine core aspects of team science: structure, diversity, and interaction dynamics. We evaluate team performance across four tasks: CommonsenseQA, StrategyQA, Social IQa, and Latent Implicit Hate, spanning commonsense and social reasoning. Our results show that flat teams tend to perform better than hierarchical ones, while diversity has a nuanced impact. Interviews suggest agents are overconfident about their team performance, yet post-task reflections reveal both appreciation for collaboration and challenges in integration, including limited conversational coordination.

Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics

TL;DR

The paper investigates whether human team science insights—focusing on team structure, diversity, and interaction dynamics—can improve multi-agent systems driven by large language models. It introduces a framework to compare flat versus hierarchical structures, assigns demographic personas to agents, and uses four reasoning-focused tasks (CommonsenseQA, StrategyQA, Social IQa, and Implicit Hate) plus elicitation and LLM-based judgments to assess performance and coordination. Key findings show that flat teams generally outperform hierarchical ones on reasoning tasks, while demographic diversity yields nuanced, task- and structure-dependent effects; pre-task overconfidence contrasts with post-task integration challenges, especially in hierarchical setups. The work provides design implications for interpretable, collaborative AI teams and suggests future work on adaptive structures, diverse and cross-cultural evaluation, and deeper interpretability to understand how teams reason, disagree, and converge over time.

Abstract

Multi-Agent Systems (MAS) with Large Language Model (LLM)-powered agents are gaining attention, yet fewer studies explore their team dynamics. Inspired by human team science, we propose a multi-agent framework to examine core aspects of team science: structure, diversity, and interaction dynamics. We evaluate team performance across four tasks: CommonsenseQA, StrategyQA, Social IQa, and Latent Implicit Hate, spanning commonsense and social reasoning. Our results show that flat teams tend to perform better than hierarchical ones, while diversity has a nuanced impact. Interviews suggest agents are overconfident about their team performance, yet post-task reflections reveal both appreciation for collaboration and challenges in integration, including limited conversational coordination.

Paper Structure

This paper contains 49 sections, 16 figures, 24 tables, 2 algorithms.

Figures (16)

  • Figure 1: Conversation flows in (a) flat and (b) hierarchical teams. In flat teams, agents respond independently and iteratively refine their answers. In hierarchical teams, leader agents issue instructions and determine the final answer based on others' responses.
  • Figure 2: Average score for Q$^{\text{pre}}_3$, Q$^{\text{pre}}_4$, Q$^{\text{pre}}_5$. a) flat structure. b) hierarchical structure.
  • Figure 3: Average score all post-elicitation probing. a) flat structure. b) hierarchical structure.
  • Figure 4: Trend of team diversity and performance in flat teams and hierarchical teams for Implicit Hate.
  • Figure 5: Trend of team diversity and performance in flat teams and hierarchical teams for CS dataset. $x$-axis represents the level of team diversity, calculated through Gini Index, and $y$-axis represents performance of teams.
  • ...and 11 more figures