Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Angana Borah; Rada Mihalcea

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Angana Borah, Rada Mihalcea

TL;DR

This study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases: self-reflection with in-context examples (ICE); and supervised fine-tuning.

Abstract

As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (>= 50\% of the time). Furthermore, these biases tend to escalate following multi-agent interactions. To mitigate them, we propose two strategies: self-reflection with in-context examples (ICE); and supervised fine-tuning. Our research demonstrates that both methods effectively mitigate implicit biases, with the ensemble of fine-tuning and self-reflection proving to be the most successful.

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

TL;DR

Abstract

Paper Structure (40 sections, 2 equations, 22 figures, 3 tables)

This paper contains 40 sections, 2 equations, 22 figures, 3 tables.

Introduction
Related Work
Dataset
A Metric for Bias Evaluation
Bias Detection using Multi-Agent LLM Interaction
Experiments and Results: Bias Detection
Multi-agent interaction
Domain-based Analysis
Bias Mitigation
Fine-tuning (FT) LLM
Self-reflection Prompting With and Without In-context Examples
Integrating Mitigation Strategies into the Interactions
Experiments and Results: Bias Mitigation
Conclusion and Lessons Learned
Limitations
...and 25 more sections

Figures (22)

Figure 1: Interaction framework. Displays four rounds of interaction: The first assignment is to assign tasks, followed by two discussion rounds, and the final assignment. Each agent is a different LLM assuming different personas. We randomize the order of agents in our framework to eliminate position bias
Figure 2: Example from the Scenarios Dataset, from the 'School' domain
Figure 3: Domain-based analysis for 'no-interaction'. Biases differ across domains. All scores are positive showing biases towards males by all models.
Figure 4: Domain-based analysis in the 'interaction' setting. All scores are positive showing biases towards males. Biases increase after interaction for all domains across models and settings.
Figure 5: Implicit Bias Mitigation strategies in multi-agent LLM interaction. We show FT, SR and an ensemble for FT and SR. (FT: Finetuning, SR: Self Reflection)
...and 17 more figures

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

TL;DR

Abstract

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Authors

TL;DR

Abstract

Table of Contents

Figures (22)