A Multi-LLM Debiasing Framework
Deonna M. Owens, Ryan A. Rossi, Sungchul Kim, Tong Yu, Franck Dernoncourt, Xiang Chen, Ruiyi Zhang, Jiuxiang Gu, Hanieh Deilamsalehy, Nedim Lipka
TL;DR
This work introduces a multi-LLM debiasing framework that leverages centralized and decentralized model collaboration to reduce bias in LLM outputs. By pairing a hard-instance BBQ-Hard benchmark with structured interaction protocols, the study demonstrates that decentralized multi-LLM configurations more effectively mitigate bias across multiple social groups, though model interactions and tasks can influence results. A dedicated BBQ-Hard testbed, along with extensive ablations and cross-model analyses, shows that bias reductions can be achieved without severely compromising accuracy, highlighting the practical potential of conversational multi-LLM debiasing. The paper also discusses limitations and ethical considerations, framing debiasing as a component of broader fairness efforts rather than a complete solution.
Abstract
Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities. Despite significant advancements in bias mitigation techniques using data augmentation, zero-shot prompting, and model fine-tuning, biases continuously persist, including subtle biases that may elude human detection. Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning and factuality in LLMs. Building on this approach, we propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs. Our work is the first to introduce and evaluate two distinct approaches within this framework for debiasing LLMs: a centralized method, where the conversation is facilitated by a single central LLM, and a decentralized method, where all models communicate directly. Our findings reveal that our multi-LLM framework significantly reduces bias in LLMs, outperforming the baseline method across several social groups.
