Table of Contents
Fetching ...

GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Spencer Rarrick, Ranjita Naik, Sundar Poudel, Vishal Chowdhary

TL;DR

GATE X-E introduces a comprehensive benchmark for evaluating gender bias in translations from weakly gendered languages (Turkish, Hungarian, Finnish, Persian) into English, featuring AGME-aware variants (feminine, masculine, neutral) and an open-source GPT-4-based translation gender rewriting solution. The dataset provides detailed annotations, label definitions, and statistics to characterize how AGMEs propagate or transform gender information in translation. Empirical results show that GPT-4 achieves high exact-match accuracy on pronoun-only rewrites but faces substantial challenges with gendered-noun rewrites, underscoring the complexity of coreference and noun gender in rewrites. By releasing GATE X-E and associated tooling, the work enables broader research on debiasing MT and evaluating gender-aware rewriting strategies in multilingual settings.

Abstract

Neural Machine Translation (NMT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present a translation gender rewriting solution built with GPT-4 and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.

GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

TL;DR

GATE X-E introduces a comprehensive benchmark for evaluating gender bias in translations from weakly gendered languages (Turkish, Hungarian, Finnish, Persian) into English, featuring AGME-aware variants (feminine, masculine, neutral) and an open-source GPT-4-based translation gender rewriting solution. The dataset provides detailed annotations, label definitions, and statistics to characterize how AGMEs propagate or transform gender information in translation. Empirical results show that GPT-4 achieves high exact-match accuracy on pronoun-only rewrites but faces substantial challenges with gendered-noun rewrites, underscoring the complexity of coreference and noun gender in rewrites. By releasing GATE X-E and associated tooling, the work enables broader research on debiasing MT and evaluating gender-aware rewriting strategies in multilingual settings.

Abstract

Neural Machine Translation (NMT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present a translation gender rewriting solution built with GPT-4 and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.
Paper Structure (31 sections, 8 figures, 14 tables)

This paper contains 31 sections, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Gender Bias in Turkish-English Translation. When translating from Turkish to English, the model tends to use the female pronoun she for gender-unspecified individuals, likely due to a perceived link between women and child care. This bias can be mitigated by providing feminine, masculine, and neutral rewrites.
  • Figure 2: GATE X-E Example Instance. This includes Turkish source; feminine, masculine and gender-neutral English translations; and labels.
  • Figure 3: Boxplots representing the distribution of sentence lengths in source and target languages. The four language pairs are Finnish to English (fi > en), Hungarian to English (hu > en), Persian to English (fa > en), and Turkish to English (tu > en). The left plot represents the source language sentence lengths, and the right plot represents the target language (English) sentence lengths. The color of each boxplot corresponds to the language pair as indicated in the legend.
  • Figure 4: Distribution of errors in GPT-3.5 Turbo's zero-shot and few-shot settings. The majority of errors in both settings stem from unrelated modifications and the model's 'None' response, indicating no need for gender-neutral rewriting.
  • Figure 5: Zero-shot prompt template utilized in GPT-3.5 Turbo experiments.
  • ...and 3 more figures