Table of Contents
Fetching ...

The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems

Thi-Nhung Nguyen, Linhao Luo, Thuy-Trang Vu, Dinh Phung

TL;DR

This work studies stereotypical bias in multi-agent systems (MAS) of LLMs by formalizing MAS as a directed graph of agents with group identities and communication protocols. Through simulations on CrowSPairs, StereoSet, and BBQ, the authors measure bias emergence, propagation, and amplification under different LLMs and interaction styles, revealing that MAS are generally less robust than single-agent systems and that ingroup favoritism drives early bias. They show that cooperative and debate-based communication can mitigate amplification, while stronger underlying LLMs improve overall robustness; neutrality and protocol design also significantly affect fairness. The study also explores bias injection attacks and defense mechanisms, finding that neutral boosts and robust models offer the strongest resilience, with practical implications for designing fair, multi-agent LLM ecosystems. Overall, the results underscore the need for robust model selection, thoughtful communication protocols, and multi-perspective reasoning to curb bias in MAS while highlighting avenues for defense against adversarial manipulation.

Abstract

Bias in large language models (LLMs) remains a persistent challenge, manifesting in stereotyping and unfair treatment across social groups. While prior research has primarily focused on individual models, the rise of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and largely unexplored dynamics in bias emergence and propagation. In this work, we present a comprehensive study of stereotypical bias in MAS, examining how internal specialization, underlying LLMs and inter-agent communication protocols influence bias robustness, propagation, and amplification. We simulate social contexts where agents represent different social groups and evaluate system behavior under various interaction and adversarial scenarios. Experiments on three bias benchmarks reveal that MAS are generally less robust than single-agent systems, with bias often emerging early through in-group favoritism. However, cooperative and debate-based communication can mitigate bias amplification, while more robust underlying LLMs improve overall system stability. Our findings highlight critical factors shaping fairness and resilience in multi-agent LLM systems.

The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems

TL;DR

This work studies stereotypical bias in multi-agent systems (MAS) of LLMs by formalizing MAS as a directed graph of agents with group identities and communication protocols. Through simulations on CrowSPairs, StereoSet, and BBQ, the authors measure bias emergence, propagation, and amplification under different LLMs and interaction styles, revealing that MAS are generally less robust than single-agent systems and that ingroup favoritism drives early bias. They show that cooperative and debate-based communication can mitigate amplification, while stronger underlying LLMs improve overall robustness; neutrality and protocol design also significantly affect fairness. The study also explores bias injection attacks and defense mechanisms, finding that neutral boosts and robust models offer the strongest resilience, with practical implications for designing fair, multi-agent LLM ecosystems. Overall, the results underscore the need for robust model selection, thoughtful communication protocols, and multi-perspective reasoning to curb bias in MAS while highlighting avenues for defense against adversarial manipulation.

Abstract

Bias in large language models (LLMs) remains a persistent challenge, manifesting in stereotyping and unfair treatment across social groups. While prior research has primarily focused on individual models, the rise of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and largely unexplored dynamics in bias emergence and propagation. In this work, we present a comprehensive study of stereotypical bias in MAS, examining how internal specialization, underlying LLMs and inter-agent communication protocols influence bias robustness, propagation, and amplification. We simulate social contexts where agents represent different social groups and evaluate system behavior under various interaction and adversarial scenarios. Experiments on three bias benchmarks reveal that MAS are generally less robust than single-agent systems, with bias often emerging early through in-group favoritism. However, cooperative and debate-based communication can mitigate bias amplification, while more robust underlying LLMs improve overall system stability. Our findings highlight critical factors shaping fairness and resilience in multi-agent LLM systems.

Paper Structure

This paper contains 28 sections, 3 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Example of Stereotypical Bias in MAS. The biased answers are A and B. Bias emerges when the agent 1 selects A, propagates to the agent 2, who then concedes and aligns with A, and is amplified by increasing the dominance of biased answer A.
  • Figure 2: System Robustness of MAS across LLM families on BBQ dataset. Agents’ social groups drawn from different groups within the intra-group pool.
  • Figure 3: Emergence, propagation, and amplification of stereotypical bias in MAS using GPT-4.1-mini on BBQ datasets. Agents’ social groups drawn from different groups within the intra-group pool.
  • Figure 4: Robustness under varying numbers of attacked agents and total agents in MAS. (a) MAS with LLama-3.1-8b. (b) MAS with GPT-4.o-mini.
  • Figure 5: Robustness to Bias Attacks of MAS Across Different Defense Mechanisms. Note: Llama-3.1-8b-Instruct results are min-max normalized for better visualization: Amplification is scaled from its original range [0.96, 1.90] to [0.8, 1.0], and Robustness from [0.03, 0.06] to [0.4, 0.5] to roughly match the scale of the other models. All other models remain in their original scale.
  • ...and 14 more figures