Table of Contents
Fetching ...

Trustworthy Social Bias Measurement

Rishi Bommasani, Percy Liang

TL;DR

Trustworthy measurement of social bias in NLP addresses why existing bias metrics lack reliability and clear validation. The authors introduce DivDist, a general measurement framework that computes bias by comparing observed associations to a reference via a normalization step and a divergence measure, and instantiate five measures across text, static embeddings, and contextualized representations. They accompany DivDist with an eight-desiderata testing protocol drawn from measurement modeling, demonstrating both validity and reliability through multiple experiments, including correlations with employment trends and analyses of GPT‑2 bias amplification and debiasing methods. The work argues for a principled, reference‑frame aware approach to bias measurement that can guide responsible deployment and mitigation of biased language technologies.

Abstract

How do we design measures of social bias that we trust? While prior work has introduced several measures, no measure has gained widespread trust: instead, mounting evidence argues we should distrust these measures. In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling. To combat the frequently fuzzy treatment of social bias in NLP, we explicitly define social bias, grounded in principles drawn from social science research. We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures. To validate our measures, we propose a rigorous testing protocol with 8 testing criteria (e.g. predictive validity: do measures predict biases in US employment?). Through our testing, we demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.

Trustworthy Social Bias Measurement

TL;DR

Trustworthy measurement of social bias in NLP addresses why existing bias metrics lack reliability and clear validation. The authors introduce DivDist, a general measurement framework that computes bias by comparing observed associations to a reference via a normalization step and a divergence measure, and instantiate five measures across text, static embeddings, and contextualized representations. They accompany DivDist with an eight-desiderata testing protocol drawn from measurement modeling, demonstrating both validity and reliability through multiple experiments, including correlations with employment trends and analyses of GPT‑2 bias amplification and debiasing methods. The work argues for a principled, reference‑frame aware approach to bias measurement that can guide responsible deployment and mitigation of biased language technologies.

Abstract

How do we design measures of social bias that we trust? While prior work has introduced several measures, no measure has gained widespread trust: instead, mounting evidence argues we should distrust these measures. In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling. To combat the frequently fuzzy treatment of social bias in NLP, we explicitly define social bias, grounded in principles drawn from social science research. We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures. To validate our measures, we propose a rigorous testing protocol with 8 testing criteria (e.g. predictive validity: do measures predict biases in US employment?). Through our testing, we demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.
Paper Structure (26 sections, 5 equations, 8 tables)