Table of Contents
Fetching ...

GeNRe: A French Gender-Neutral Rewriting System Using Collective Nouns

Enzo Doyen, Amalia Todirascu

TL;DR

GeNRe tackles gender biases in French by introducing the first CN-based gender-neutral rewriting system for French. It combines a rule-based approach, fine-tuned Seq2Seq models, and an instruct-based model (Claude 3 Opus), all trained or guided by a manually built CN dictionary and a French corpora-derived dataset. Results show that the rule-based system yields the strongest performance, with the instruct-based DICT configuration close in effectiveness, while fine-tuned models provide competitive but generally lower results. This work demonstrates the viability of CN-based neutralization for French and lays groundwork for cross-linguistic application to other languages with collective nouns, contributing resources and benchmarks for future bias-mitigation efforts in NLP.

Abstract

A significant portion of the textual data used in the field of Natural Language Processing (NLP) exhibits gender biases, particularly due to the use of masculine generics (masculine words that are supposed to refer to mixed groups of men and women), which can perpetuate and amplify stereotypes. Gender rewriting, an NLP task that involves automatically detecting and replacing gendered forms with neutral or opposite forms (e.g., from masculine to feminine), can be employed to mitigate these biases. While such systems have been developed in a number of languages (English, Arabic, Portuguese, German, French), automatic use of gender neutralization techniques (as opposed to inclusive or gender-switching techniques) has only been studied for English. This paper presents GeNRe, the very first French gender-neutral rewriting system using collective nouns, which are gender-fixed in French. We introduce a rule-based system (RBS) tailored for the French language alongside two fine-tuned language models trained on data generated by our RBS. We also explore the use of instruct-based models to enhance the performance of our other systems and find that Claude 3 Opus combined with our dictionary achieves results close to our RBS. Through this contribution, we hope to promote the advancement of gender bias mitigation techniques in NLP for French.

GeNRe: A French Gender-Neutral Rewriting System Using Collective Nouns

TL;DR

GeNRe tackles gender biases in French by introducing the first CN-based gender-neutral rewriting system for French. It combines a rule-based approach, fine-tuned Seq2Seq models, and an instruct-based model (Claude 3 Opus), all trained or guided by a manually built CN dictionary and a French corpora-derived dataset. Results show that the rule-based system yields the strongest performance, with the instruct-based DICT configuration close in effectiveness, while fine-tuned models provide competitive but generally lower results. This work demonstrates the viability of CN-based neutralization for French and lays groundwork for cross-linguistic application to other languages with collective nouns, contributing resources and benchmarks for future bias-mitigation efforts in NLP.

Abstract

A significant portion of the textual data used in the field of Natural Language Processing (NLP) exhibits gender biases, particularly due to the use of masculine generics (masculine words that are supposed to refer to mixed groups of men and women), which can perpetuate and amplify stereotypes. Gender rewriting, an NLP task that involves automatically detecting and replacing gendered forms with neutral or opposite forms (e.g., from masculine to feminine), can be employed to mitigate these biases. While such systems have been developed in a number of languages (English, Arabic, Portuguese, German, French), automatic use of gender neutralization techniques (as opposed to inclusive or gender-switching techniques) has only been studied for English. This paper presents GeNRe, the very first French gender-neutral rewriting system using collective nouns, which are gender-fixed in French. We introduce a rule-based system (RBS) tailored for the French language alongside two fine-tuned language models trained on data generated by our RBS. We also explore the use of instruct-based models to enhance the performance of our other systems and find that Claude 3 Opus combined with our dictionary achieves results close to our RBS. Through this contribution, we hope to promote the advancement of gender bias mitigation techniques in NLP for French.

Paper Structure

This paper contains 26 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Rule-based model replacement pipeline overview
  • Figure 2: Error distribution for RBS, fine-tuned and instruct-based models