BIASINSPECTOR: Detecting Bias in Structured Data through LLM Agents
Haoxuan Li, Mingyu Derek Ma, Jen-tse Huang, Zhaotian Weng, Wei Wang, Jieyu Zhao
TL;DR
This work tackles the challenge of detecting biases in structured data, where prior automated methods struggle to generalize across data types and biases. It introduces BiasInspector, the first end-to-end, multi-agent framework that jointly plans, executes a diverse toolbox of bias-detection methods, and provides detailed visualizations and explanations. A new BiasBenchmark benchmark evaluates both end results and intermediate processes, showing BiasInspector achieves high accuracy (up to around 78% in bias-degree tasks) and robust performance across planning and tooling, especially when powered by GPT-4o. The framework’s extensible toolset and method library, coupled with a standardized evaluation protocol, offer a practical path toward fairer data workflows and set a benchmark for future LLM-agent bias detection research.
Abstract
Detecting biases in structured data is a complex and time-consuming task. Existing automated techniques are limited in diversity of data types and heavily reliant on human case-by-case handling, resulting in a lack of generalizability. Currently, large language model (LLM)-based agents have made significant progress in data science, but their ability to detect data biases is still insufficiently explored. To address this gap, we introduce the first end-to-end, multi-agent synergy framework, BIASINSPECTOR, designed for automatic bias detection in structured data based on specific user requirements. It first develops a multi-stage plan to analyze user-specified bias detection tasks and then implements it with a diverse and well-suited set of tools. It delivers detailed results that include explanations and visualizations. To address the lack of a standardized framework for evaluating the capability of LLM agents to detect biases in data, we further propose a comprehensive benchmark that includes multiple evaluation metrics and a large set of test cases. Extensive experiments demonstrate that our framework achieves exceptional overall performance in structured data bias detection, setting a new milestone for fairer data applications.
