Table of Contents
Fetching ...

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W Black

TL;DR

This paper addresses the pervasive bias in word embeddings by extending prior binary debiasing to multiclass attributes (e.g., race, religion). It introduces a multiclass bias subspace identified via PCA over class-defining word sets and applies both hard and soft debiasing to remove bias components, accompanied by a novel MAC metric for evaluation. The authors validate that multiclass debiasing significantly reduces bias and generally preserves performance on downstream NLP tasks such as NER, POS tagging, and chunking, though task-specific effects vary. They also discuss limitations, notably reliance on US-centric lexicons and incomplete removal of cluster bias, outlining directions for cross-cultural extensions and more robust evaluation.

Abstract

Online texts -- across genres, registers, domains, and styles -- are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

TL;DR

This paper addresses the pervasive bias in word embeddings by extending prior binary debiasing to multiclass attributes (e.g., race, religion). It introduces a multiclass bias subspace identified via PCA over class-defining word sets and applies both hard and soft debiasing to remove bias components, accompanied by a novel MAC metric for evaluation. The authors validate that multiclass debiasing significantly reduces bias and generally preserves performance on downstream NLP tasks such as NER, POS tagging, and chunking, though task-specific effects vary. They also discuss limitations, notably reliance on US-centric lexicons and incomplete removal of cluster bias, outlining directions for cross-cultural extensions and more robust evaluation.

Abstract

Online texts -- across genres, registers, domains, and styles -- are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.

Paper Structure

This paper contains 19 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Plots of number of neighbors to jew for each profession as a function of its original bias with respect to jew, before and after debiasing, for different subspace dimensionalities $k$.
  • Figure 2: Plots of number of neighbors to muslim for each profession as a function of its original bias with respect to muslim, before and after debiasing, for different subspace dimensionalities $k$.