Table of Contents
Fetching ...

Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data

Tianyi Huang, Elsa Fan

TL;DR

The paper addresses textual bias in large language models by introducing a multi-agent bias-detection pipeline that first classifies statements as fact or opinion, then scores bias intensity and generates concise justifications. Across 1,500 WikiNPOV samples, the approach achieves 84.9% accuracy, outperforming a zero-shot baseline by 13.0 percentage points and demonstrating robust performance across multiple LLMs. Its combination of improved detection and interpretable explanations advances fairness and accountability in AI, offering audit-friendly outputs suitable for deployment in high-stakes contexts. The framework lays groundwork for scalable, explainable bias assessment, with potential extensions to richer bias scoring, cross-domain evaluation, and multilingual applicability.

Abstract

From disinformation spread by AI chatbots to AI recommendations that inadvertently reinforce stereotypes, textual bias poses a significant challenge to the trustworthiness of large language models (LLMs). In this paper, we propose a multi-agent framework that systematically identifies biases by disentangling each statement as fact or opinion, assigning a bias intensity score, and providing concise, factual justifications. Evaluated on 1,500 samples from the WikiNPOV dataset, the framework achieves 84.9% accuracy$\unicode{x2014}$an improvement of 13.0% over the zero-shot baseline$\unicode{x2014}$demonstrating the efficacy of explicitly modeling fact versus opinion prior to quantifying bias intensity. By combining enhanced detection accuracy with interpretable explanations, this approach sets a foundation for promoting fairness and accountability in modern language models.

Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data

TL;DR

The paper addresses textual bias in large language models by introducing a multi-agent bias-detection pipeline that first classifies statements as fact or opinion, then scores bias intensity and generates concise justifications. Across 1,500 WikiNPOV samples, the approach achieves 84.9% accuracy, outperforming a zero-shot baseline by 13.0 percentage points and demonstrating robust performance across multiple LLMs. Its combination of improved detection and interpretable explanations advances fairness and accountability in AI, offering audit-friendly outputs suitable for deployment in high-stakes contexts. The framework lays groundwork for scalable, explainable bias assessment, with potential extensions to richer bias scoring, cross-domain evaluation, and multilingual applicability.

Abstract

From disinformation spread by AI chatbots to AI recommendations that inadvertently reinforce stereotypes, textual bias poses a significant challenge to the trustworthiness of large language models (LLMs). In this paper, we propose a multi-agent framework that systematically identifies biases by disentangling each statement as fact or opinion, assigning a bias intensity score, and providing concise, factual justifications. Evaluated on 1,500 samples from the WikiNPOV dataset, the framework achieves 84.9% accuracyan improvement of 13.0% over the zero-shot baselinedemonstrating the efficacy of explicitly modeling fact versus opinion prior to quantifying bias intensity. By combining enhanced detection accuracy with interpretable explanations, this approach sets a foundation for promoting fairness and accountability in modern language models.

Paper Structure

This paper contains 23 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An illustration of how biases present in a training dataset can be inherited by an AI model during training and reflected in the model's responses, potentially compromising objectivity.
  • Figure 2: Overview of the multi-agent bias detection pipeline. Text statements first enter a checker agent to be classified as fact or opinion. Factual statements are then verified by a justification agent for bias, while opinionated statements undergo evaluation by a validation agent. Finally, the system outputs a final decision (biased or unbiased) alongside a concise justification.
  • Figure 3: Confusion Matrix for the Baseline on 100 WikiNPOV statements (GPT-4o).
  • Figure 4: Confusion Matrix for the Pipeline on 100 WikiNPOV statements (GPT-4o).
  • Figure 5: Comparing JSON Output of Baseline and Pipeline
  • ...and 1 more figures