Table of Contents
Fetching ...

Gender Bias in MT for a Genderless Language: New Benchmarks for Basque

Amaia Murillo, Olatz-Perez-de-Viñaspre, Naiara Perez

TL;DR

Evaluating several general-purpose LLMs and open and proprietary MT systems reveals a systematic preference for masculine forms and, in some models, a slightly higher quality for masculine referents, which highlights the need to develop evaluation methods that consider both linguistic features and cultural context.

Abstract

Large language models (LLMs) and machine translation (MT) systems are increasingly used in our daily lives, but their outputs can reproduce gender bias present in the training data. Most resources for evaluating such biases are designed for English and reflect its sociocultural context, which limits their applicability to other languages. This work addresses this gap by introducing two new datasets to evaluate gender bias in translations involving Basque, a low-resource and genderless language. WinoMTeus adapts the WinoMT benchmark to examine how gender-neutral Basque occupations are translated into gendered languages such as Spanish and French. FLORES+Gender, in turn, extends the FLORES+ benchmark to assess whether translation quality varies when translating from gendered languages (Spanish and English) into Basque depending on the gender of the referent. We evaluate several general-purpose LLMs and open and proprietary MT systems. The results reveal a systematic preference for masculine forms and, in some models, a slightly higher quality for masculine referents. Overall, these findings show that gender bias is still deeply rooted in these models, and highlight the need to develop evaluation methods that consider both linguistic features and cultural context.

Gender Bias in MT for a Genderless Language: New Benchmarks for Basque

TL;DR

Evaluating several general-purpose LLMs and open and proprietary MT systems reveals a systematic preference for masculine forms and, in some models, a slightly higher quality for masculine referents, which highlights the need to develop evaluation methods that consider both linguistic features and cultural context.

Abstract

Large language models (LLMs) and machine translation (MT) systems are increasingly used in our daily lives, but their outputs can reproduce gender bias present in the training data. Most resources for evaluating such biases are designed for English and reflect its sociocultural context, which limits their applicability to other languages. This work addresses this gap by introducing two new datasets to evaluate gender bias in translations involving Basque, a low-resource and genderless language. WinoMTeus adapts the WinoMT benchmark to examine how gender-neutral Basque occupations are translated into gendered languages such as Spanish and French. FLORES+Gender, in turn, extends the FLORES+ benchmark to assess whether translation quality varies when translating from gendered languages (Spanish and English) into Basque depending on the gender of the referent. We evaluate several general-purpose LLMs and open and proprietary MT systems. The results reveal a systematic preference for masculine forms and, in some models, a slightly higher quality for masculine referents. Overall, these findings show that gender bias is still deeply rooted in these models, and highlight the need to develop evaluation methods that consider both linguistic features and cultural context.
Paper Structure (29 sections, 2 figures, 5 tables)

This paper contains 29 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Overview of WinoMTeus. Sentences with gender-neutral (N) occupation mentions, taken from WinoMT stanovsky2019evaluatinggenderbiasmachine and translated to Basque, are translated into a gendered language; we compare the resulting gendered occupation distribution (male M or female F) with real-world data.
  • Figure 2: Overview of FLORES+Gender. Using Basque-gendered language pairs from FLORES+, we create a contrastive version of the gendered sentence (male M or female F). Both are translated into Basque, and translation quality is compared to asses the effect of source-sentence gender.