A Multi-LLM Debiasing Framework

Deonna M. Owens; Ryan A. Rossi; Sungchul Kim; Tong Yu; Franck Dernoncourt; Xiang Chen; Ruiyi Zhang; Jiuxiang Gu; Hanieh Deilamsalehy; Nedim Lipka

A Multi-LLM Debiasing Framework

Deonna M. Owens, Ryan A. Rossi, Sungchul Kim, Tong Yu, Franck Dernoncourt, Xiang Chen, Ruiyi Zhang, Jiuxiang Gu, Hanieh Deilamsalehy, Nedim Lipka

TL;DR

This work introduces a multi-LLM debiasing framework that leverages centralized and decentralized model collaboration to reduce bias in LLM outputs. By pairing a hard-instance BBQ-Hard benchmark with structured interaction protocols, the study demonstrates that decentralized multi-LLM configurations more effectively mitigate bias across multiple social groups, though model interactions and tasks can influence results. A dedicated BBQ-Hard testbed, along with extensive ablations and cross-model analyses, shows that bias reductions can be achieved without severely compromising accuracy, highlighting the practical potential of conversational multi-LLM debiasing. The paper also discusses limitations and ethical considerations, framing debiasing as a component of broader fairness efforts rather than a complete solution.

Abstract

Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities. Despite significant advancements in bias mitigation techniques using data augmentation, zero-shot prompting, and model fine-tuning, biases continuously persist, including subtle biases that may elude human detection. Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning and factuality in LLMs. Building on this approach, we propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs. Our work is the first to introduce and evaluate two distinct approaches within this framework for debiasing LLMs: a centralized method, where the conversation is facilitated by a single central LLM, and a decentralized method, where all models communicate directly. Our findings reveal that our multi-LLM framework significantly reduces bias in LLMs, outperforming the baseline method across several social groups.

A Multi-LLM Debiasing Framework

TL;DR

Abstract

Paper Structure (31 sections, 10 equations, 5 figures, 15 tables)

This paper contains 31 sections, 10 equations, 5 figures, 15 tables.

Introduction
Related Work
Multi-LLM Techniques in LLMs
Data Debiasing
Response Debiasing
Model Debiasing
Ensemble Techniques in LLMs
BBQ-Hard Benchmark
Multi-LLM Debiasing Framework
Centralized
Decentralized
Methodology
Bias Benchmark for QA (BBQ)
Baseline Approach
Centralized Multi-LLM Approach
...and 16 more sections

Figures (5)

Figure 1: (a) Distribution of bootstrapped bias scores for the baseline, multi-LLM decentralized, and multi-LLM centralized approaches. The dashed line shows the bias score without bootstrapping, (b) the communication topology for our centralized multi-LLM debiasing framework, and (c) the communication topology for our decentralized multi-LLM debiasing framework. For both (b) and (c), the nodes represent the different LLMs, and the edges represent the communication channel between the models.
Figure 2: Overview of centralized and decentralized multi-LLM processes. The blue arrows represent the transition to the next step in the process. For further details, please see Sections \ref{['sec:centralized-framework']} and \ref{['sec:decentralized-framework']}.
Figure 3: Baseline prompt
Figure 4: Centralized and decentralized method prompts
Figure 5: Overview of Centralized Multi-LLM Debiasing Framework. Note that each node represents an LLM whereas edges between the nodes indicate their communication. The central LLM is shown in black whereas the non-central/leaf LLMs are shown in green. Further, a self-loop represents that the model generates a response, that is, in (a) we see a self-loop with x, which indicates that the model uses the input x to generate an initial response $y_1$, whereas later in (c) we see that the other models $M_2,\ldots, M_k$ have self-loops with $x, y_1$ as input to generate new responses for each denoted as $y_2,\ldots,y_k$, respectively. See text for discussion.

A Multi-LLM Debiasing Framework

TL;DR

Abstract

A Multi-LLM Debiasing Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (5)