Table of Contents
Fetching ...

"The Dentist is an involved parent, the bartender is not": Revealing Implicit Biases in QA with Implicit BBQ

Aarushi Wagh, Saniya Srivastava

TL;DR

This work addresses the gap where existing bias benchmarks rely on explicit protected attributes, failing to capture implicit biases in real-world language use. It introduces ImplicitBBQ, a prompt-based rewriting extension of BBQ that embeds implicit cues across six categories to test LLM fairness in more naturalistic contexts. Empirical evaluation on GPT-4o reveals substantial performance drops in several categories, indicating that models harbor implicit biases not detected by explicit benchmarks, with nuanced differences between certain and uncertain predictions. The study argues for fairness evaluations that generalize beyond explicit cues and presents ImplicitBBQ as a crucial tool for robust, nuanced bias testing in NLP.

Abstract

Existing benchmarks evaluating biases in large language models (LLMs) primarily rely on explicit cues, declaring protected attributes like religion, race, gender by name. However, real-world interactions often contain implicit biases, inferred subtly through names, cultural cues, or traits. This critical oversight creates a significant blind spot in fairness evaluation. We introduce ImplicitBBQ, a benchmark extending the Bias Benchmark for QA (BBQ) with implicitly cued protected attributes across 6 categories. Our evaluation of GPT-4o on ImplicitBBQ illustrates troubling performance disparity from explicit BBQ prompts, with accuracy declining up to 7% in the "sexual orientation" subcategory and consistent decline located across most other categories. This indicates that current LLMs contain implicit biases undetected by explicit benchmarks. ImplicitBBQ offers a crucial tool for nuanced fairness evaluation in NLP.

"The Dentist is an involved parent, the bartender is not": Revealing Implicit Biases in QA with Implicit BBQ

TL;DR

This work addresses the gap where existing bias benchmarks rely on explicit protected attributes, failing to capture implicit biases in real-world language use. It introduces ImplicitBBQ, a prompt-based rewriting extension of BBQ that embeds implicit cues across six categories to test LLM fairness in more naturalistic contexts. Empirical evaluation on GPT-4o reveals substantial performance drops in several categories, indicating that models harbor implicit biases not detected by explicit benchmarks, with nuanced differences between certain and uncertain predictions. The study argues for fairness evaluations that generalize beyond explicit cues and presents ImplicitBBQ as a crucial tool for robust, nuanced bias testing in NLP.

Abstract

Existing benchmarks evaluating biases in large language models (LLMs) primarily rely on explicit cues, declaring protected attributes like religion, race, gender by name. However, real-world interactions often contain implicit biases, inferred subtly through names, cultural cues, or traits. This critical oversight creates a significant blind spot in fairness evaluation. We introduce ImplicitBBQ, a benchmark extending the Bias Benchmark for QA (BBQ) with implicitly cued protected attributes across 6 categories. Our evaluation of GPT-4o on ImplicitBBQ illustrates troubling performance disparity from explicit BBQ prompts, with accuracy declining up to 7% in the "sexual orientation" subcategory and consistent decline located across most other categories. This indicates that current LLMs contain implicit biases undetected by explicit benchmarks. ImplicitBBQ offers a crucial tool for nuanced fairness evaluation in NLP.

Paper Structure

This paper contains 7 sections, 3 tables.