Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

Matúš Pikuliak; Andrea Hrckova; Stefan Oresko; Marián Šimko

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

Matúš Pikuliak, Andrea Hrckova, Stefan Oresko, Marián Šimko

TL;DR

A new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems, GEST contains samples for 16 gender stereotypes about men and women that are compatible with the English language and 9 Slavic languages.

Abstract

We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender experts. We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems. We discovered significant and consistent amounts of gender-stereotypical reasoning in almost all the evaluated models and languages. Our experiments confirm the previously postulated hypothesis that the larger the model, the more stereotypical it usually is.

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

TL;DR

Abstract

Paper Structure (76 sections, 5 equations, 14 figures, 11 tables)

This paper contains 76 sections, 5 equations, 14 figures, 11 tables.

Introduction
Related Work
Gender Bias in LMs
Gender Bias in Machine Translation
GEST Dataset
List of Stereotypes
Sample Definition
Data Collection
Bias Measurements
English-to-Slavic Machine Translation
Metrics
Experiment
Results
Comparing MT systems.
Comparing stereotypes.
...and 61 more sections

Figures (14)

Figure 1: Basic overview of how we use one sample to test four different types of NLP systems. For all systems, we observe the grammatical gender (either feminine or masculine) of the predictions when the model is exposed to a stereotypical sentence. Other Slavic languages are used in the same way as Slovak is in this example.
Figure 2: Comparison of the global masculine rate $f_m$ and the stereotype rate $f_s$ for MT systems and target languages.
Figure 3: Boxplots for the feminine ranks of the stereotypes across all system-language pairs we evaluated in the MT experiment.
Figure 4: Stereotype rates $g_s$ for English MLMs and GLMs. GLMs are color-coded based on their family. The average score across all compatible templates is reported.
Figure 5: Boxplots for the feminine ranks of the stereotypes across all model-template pairs we evaluated in the experiment with English MLMs.
...and 9 more figures

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

TL;DR

Abstract

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (14)