Table of Contents
Fetching ...

Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models

Andrea Piergentili, Beatrice Savoldi, Matteo Negri, Luisa Bentivogli

TL;DR

This work tackles gender bias in English-to-Italian machine translation by leveraging neomorphemes to express gender inclusivity. It introduces Neo-GATE, a benchmark that uses adaptable tags to evaluate en→it translations across evolving neomorpheme paradigms, and investigates prompting strategies on four LLMs. The findings show GPT-4 and Mixtral generally deliver superior performance in generating correct neomorpheme forms when guided by well-designed prompts and demonstrations, while LLama 2 and Tower underperform for this task. Overall, Neo-GATE provides a valuable resource and framework for advancing gender-inclusive MT research and practical LLM-based translation systems.

Abstract

Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards fairer MT. In this direction, we explore prompting techniques with large language models (LLMs) to translate from English into Italian using neomorphemes. So far, this area has been under-explored due to its novelty and the lack of publicly available evaluation resources. We fill this gap by releasing Neo-GATE, a resource designed to evaluate gender-inclusive en-it translation with neomorphemes. With Neo-GATE, we assess four LLMs of different families and sizes and different prompt formats, identifying strengths and weaknesses of each on this novel task for MT.

Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models

TL;DR

This work tackles gender bias in English-to-Italian machine translation by leveraging neomorphemes to express gender inclusivity. It introduces Neo-GATE, a benchmark that uses adaptable tags to evaluate en→it translations across evolving neomorpheme paradigms, and investigates prompting strategies on four LLMs. The findings show GPT-4 and Mixtral generally deliver superior performance in generating correct neomorpheme forms when guided by well-designed prompts and demonstrations, while LLama 2 and Tower underperform for this task. Overall, Neo-GATE provides a valuable resource and framework for advancing gender-inclusive MT research and practical LLM-based translation systems.

Abstract

Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards fairer MT. In this direction, we explore prompting techniques with large language models (LLMs) to translate from English into Italian using neomorphemes. So far, this area has been under-explored due to its novelty and the lack of publicly available evaluation resources. We fill this gap by releasing Neo-GATE, a resource designed to evaluate gender-inclusive en-it translation with neomorphemes. With Neo-GATE, we assess four LLMs of different families and sizes and different prompt formats, identifying strengths and weaknesses of each on this novel task for MT.
Paper Structure (25 sections, 3 figures, 10 tables)

This paper contains 25 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Coverage and accuracy results in the few-shot settings. Darker shades indicate better performance.
  • Figure 2: Coverage-weighted accuracy percentage scores for the few-shot settings. Darker shades indicate better performance.
  • Figure 3: Mis-generation percentage scores for the few-shot settings. Higher scores (darker shades) indicate worse performance.