GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda, Andrea Piergentili, Beatrice Savoldi, Marco Madeddu, Martina Rosola, Silvia Casola, Chiara Ferrando, Viviana Patti, Matteo Negri, Luisa Bentivogli
TL;DR
This paper introduces the CALAMITA 2024 Gender-Fair Generation challenge to promote gender-fair language in Italian across detection, reformulation, and translation tasks in mono- and cross-lingual settings. It provides three benchmarks—GFL-it, GeNTE, Neo-GATE—and standardized metrics (BERTScore-based F1 for detection; classifier-based accuracy for reformulation and translation; coverage-weighted accuracy for neomorpheme-based tasks) to evaluate models. A key contribution is the curated, expert-annotated data and a prompts suite enabling reproducible evaluation of conservative and innovative obscuration strategies as well as nonbinary neomorphemes. The work advances practical understanding of gender bias in Italian NLP and lays groundwork for dataset expansion, method development, and accessibility considerations.
Abstract
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication. The challenge, designed to assess and monitor the recognition and generation of gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fair formulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a coverage-weighted accuracy for tasks 2 and 3.
