BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan; Ruizhe Chen; Ruiling Xu; Zuozhu Liu

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan, Ruizhe Chen, Ruiling Xu, Zuozhu Liu

TL;DR

BiasAlert tackles the challenge of bias evaluation in open-text generation by integrating a retrieval-augmented knowledge base with instruction-following reasoning. It introduces a plug-and-play bias detection tool that consumes LLM outputs $Y$ and produces judgments $J$ with explanations, grounded by a Social Bias Retrieval Database derived from SBIC and an instruction-tuning dataset. Empirical results on RedditBias and Crows-pairs show BiasAlert outperforms state-of-the-art baselines and confirms the necessity of retrieval and step-by-step guidance. Applications demonstrate BiasAlert for bias evaluation and bias mitigation in deployment, underscoring its practical impact for fairer LLM usage.

Abstract

Evaluating the bias in Large Language Models (LLMs) becomes increasingly crucial with their rapid development. However, existing evaluation methods rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT4-as-A-Judge in detecting bias. Furthermore, through application studies, we demonstrate the utility of BiasAlert in reliable LLM bias evaluation and bias mitigation across various scenarios. Model and code will be publicly released.

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

TL;DR

and produces judgments

with explanations, grounded by a Social Bias Retrieval Database derived from SBIC and an instruction-tuning dataset. Empirical results on RedditBias and Crows-pairs show BiasAlert outperforms state-of-the-art baselines and confirms the necessity of retrieval and step-by-step guidance. Applications demonstrate BiasAlert for bias evaluation and bias mitigation in deployment, underscoring its practical impact for fairer LLM usage.

Abstract

Paper Structure (46 sections, 4 figures, 6 tables)

This paper contains 46 sections, 4 figures, 6 tables.

Introduction
Method
Task Formulation
Social Bias Retrieval Database
Instruction-following Bias Detection
Experiment and Analysis
Experiment Setup
Datasets.
Baselines.
Evaluating Metrics.
Bias Detection Results
Ablation Study
Applications
Bias Evaluation with BiasAlert
Setup.
...and 31 more sections

Figures (4)

Figure 1: Overview of BiasAlert, designed to address the challenges in existing bias evaluation methods.
Figure 2: An illustration of the pipeline of our BiasAlert.
Figure 3: Bias evaluation results of BiasAlert.
Figure 4: Distribution of detection accuracy of baseline models on four bias types.

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

TL;DR

Abstract

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)