Table of Contents
Fetching ...

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Angana Borah, Rada Mihalcea

TL;DR

This study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases: self-reflection with in-context examples (ICE); and supervised fine-tuning.

Abstract

As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (>= 50\% of the time). Furthermore, these biases tend to escalate following multi-agent interactions. To mitigate them, we propose two strategies: self-reflection with in-context examples (ICE); and supervised fine-tuning. Our research demonstrates that both methods effectively mitigate implicit biases, with the ensemble of fine-tuning and self-reflection proving to be the most successful.

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

TL;DR

This study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases: self-reflection with in-context examples (ICE); and supervised fine-tuning.

Abstract

As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (>= 50\% of the time). Furthermore, these biases tend to escalate following multi-agent interactions. To mitigate them, we propose two strategies: self-reflection with in-context examples (ICE); and supervised fine-tuning. Our research demonstrates that both methods effectively mitigate implicit biases, with the ensemble of fine-tuning and self-reflection proving to be the most successful.
Paper Structure (40 sections, 2 equations, 22 figures, 3 tables)

This paper contains 40 sections, 2 equations, 22 figures, 3 tables.

Figures (22)

  • Figure 1: Interaction framework. Displays four rounds of interaction: The first assignment is to assign tasks, followed by two discussion rounds, and the final assignment. Each agent is a different LLM assuming different personas. We randomize the order of agents in our framework to eliminate position bias
  • Figure 2: Example from the Scenarios Dataset, from the 'School' domain
  • Figure 3: Domain-based analysis for 'no-interaction'. Biases differ across domains. All scores are positive showing biases towards males by all models.
  • Figure 4: Domain-based analysis in the 'interaction' setting. All scores are positive showing biases towards males. Biases increase after interaction for all domains across models and settings.
  • Figure 5: Implicit Bias Mitigation strategies in multi-agent LLM interaction. We show FT, SR and an ensemble for FT and SR. (FT: Finetuning, SR: Self Reflection)
  • ...and 17 more figures