Table of Contents
Fetching ...

Gender-preserving Debiasing for Pre-trained Word Embeddings

Masahiro Kaneko, Danushka Bollegala

TL;DR

The paper tackles gender bias in pre-trained word embeddings by introducing a four-set formulation (feminine, masculine, gender-neutral, stereotypical) and an autoencoder-based debiasing objective that preserves non-discriminative gender information while removing stereotypes. A projection-based GP method uses four loss terms (L_f, L_m, L_g, L_r) to balance bias removal with semantic preservation, and is compatible with existing embeddings like GloVe and GN-GloVe. Empirical results on SemBias and standard semantic/analogy tasks show the method outperforms prior debiasing approaches while retaining meaningful word semantics, and can further refine embeddings already debiased by GN-GloVe. The approach offers a practical, drop-in tool for fairer NLP systems and can be extended to other demographic biases in future work.

Abstract

Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gender-related information, while removing stereotypical discriminative gender biases from pre-trained word embeddings. Specifically, we consider four types of information: \emph{feminine}, \emph{masculine}, \emph{gender-neutral} and \emph{stereotypical}, which represent the relationship between gender vs. bias, and propose a debiasing method that (a) preserves the gender-related information in feminine and masculine words, (b) preserves the neutrality in gender-neutral words, and (c) removes the biases from stereotypical words. Experimental results on several previously proposed benchmark datasets show that our proposed method can debias pre-trained word embeddings better than existing SoTA methods proposed for debiasing word embeddings while preserving gender-related but non-discriminative information.

Gender-preserving Debiasing for Pre-trained Word Embeddings

TL;DR

The paper tackles gender bias in pre-trained word embeddings by introducing a four-set formulation (feminine, masculine, gender-neutral, stereotypical) and an autoencoder-based debiasing objective that preserves non-discriminative gender information while removing stereotypes. A projection-based GP method uses four loss terms (L_f, L_m, L_g, L_r) to balance bias removal with semantic preservation, and is compatible with existing embeddings like GloVe and GN-GloVe. Empirical results on SemBias and standard semantic/analogy tasks show the method outperforms prior debiasing approaches while retaining meaningful word semantics, and can further refine embeddings already debiased by GN-GloVe. The approach offers a practical, drop-in tool for fairer NLP systems and can be extended to other demographic biases in future work.

Abstract

Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gender-related information, while removing stereotypical discriminative gender biases from pre-trained word embeddings. Specifically, we consider four types of information: \emph{feminine}, \emph{masculine}, \emph{gender-neutral} and \emph{stereotypical}, which represent the relationship between gender vs. bias, and propose a debiasing method that (a) preserves the gender-related information in feminine and masculine words, (b) preserves the neutrality in gender-neutral words, and (c) removes the biases from stereotypical words. Experimental results on several previously proposed benchmark datasets show that our proposed method can debias pre-trained word embeddings better than existing SoTA methods proposed for debiasing word embeddings while preserving gender-related but non-discriminative information.

Paper Structure

This paper contains 21 sections, 5 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Cosine similarity between gender, gender-neutral, stereotypical words and the gender direction.