Table of Contents
Fetching ...

A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications

Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, Si-Qing Chen

TL;DR

This paper tackles misgendering in multilingual LLM applications by developing language-specific guardrails through participatory design across 42 languages and testing them in a meeting-transcript summarization task. It employs a human-in-the-loop pipeline to generate and verify synthetic multilingual transcripts, then evaluates guardrails with both human judges and LLM evaluators using defined metrics for misgendering and quality. Results show substantial reductions in gender mistakes and assumptions without sacrificing output quality, while also revealing limitations in LLM evaluators, especially for low-resource languages. The study releases the guardrails and a 42-language synthetic dataset to enable broader research and promote culturally informed responsible AI across languages and contexts.

Abstract

Misgendering is the act of referring to someone by a gender that does not match their chosen identity. It marginalizes and undermines a person's sense of self, causing significant harm. English-based approaches have clear-cut approaches to avoiding misgendering, such as the use of the pronoun ``they''. However, other languages pose unique challenges due to both grammatical and cultural constructs. In this work we develop methodologies to assess and mitigate misgendering across 42 languages and dialects using a participatory-design approach to design effective and appropriate guardrails across all languages. We test these guardrails in a standard LLM-based application (meeting transcript summarization), where both the data generation and the annotation steps followed a human-in-the-loop approach. We find that the proposed guardrails are very effective in reducing misgendering rates across all languages in the summaries generated, and without incurring loss of quality. Our human-in-the-loop approach demonstrates a method to feasibly scale inclusive and responsible AI-based solutions across multiple languages and cultures. We release the guardrails and synthetic dataset encompassing 42 languages, along with human and LLM-judge evaluations, to encourage further research on this subject.

A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications

TL;DR

This paper tackles misgendering in multilingual LLM applications by developing language-specific guardrails through participatory design across 42 languages and testing them in a meeting-transcript summarization task. It employs a human-in-the-loop pipeline to generate and verify synthetic multilingual transcripts, then evaluates guardrails with both human judges and LLM evaluators using defined metrics for misgendering and quality. Results show substantial reductions in gender mistakes and assumptions without sacrificing output quality, while also revealing limitations in LLM evaluators, especially for low-resource languages. The study releases the guardrails and a 42-language synthetic dataset to enable broader research and promote culturally informed responsible AI across languages and contexts.

Abstract

Misgendering is the act of referring to someone by a gender that does not match their chosen identity. It marginalizes and undermines a person's sense of self, causing significant harm. English-based approaches have clear-cut approaches to avoiding misgendering, such as the use of the pronoun ``they''. However, other languages pose unique challenges due to both grammatical and cultural constructs. In this work we develop methodologies to assess and mitigate misgendering across 42 languages and dialects using a participatory-design approach to design effective and appropriate guardrails across all languages. We test these guardrails in a standard LLM-based application (meeting transcript summarization), where both the data generation and the annotation steps followed a human-in-the-loop approach. We find that the proposed guardrails are very effective in reducing misgendering rates across all languages in the summaries generated, and without incurring loss of quality. Our human-in-the-loop approach demonstrates a method to feasibly scale inclusive and responsible AI-based solutions across multiple languages and cultures. We release the guardrails and synthetic dataset encompassing 42 languages, along with human and LLM-judge evaluations, to encourage further research on this subject.

Paper Structure

This paper contains 22 sections, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Number of genders in different languages for the "number of genders" feature from the World Atlas of Language Structures (WALS).
  • Figure 2: Data generation pipeline. The generator takes in a list of participants (with specified or non-specified genders), a topic, and a language (not pictured). It generates a transcript that is sent to the verifier along with its input. The verifier decides whether the transcript fulfilled the requirements: if not, it sends it back to the Generator along with feedback for improvement. All final transcripts are human-verified and corrected.
  • Figure 3: GA Results for the human individual evaluation, broken down by language. Overall, the GA score was consistently lowered across all languages.
  • Figure 4: GA Score grouped by language class. There are no class 2 languages in our work. GPT-4o was off in low-resource languages: it undershot in the base scenario, and overestimated GA with guardrails enabled.
  • Figure 5: GM Score grouped by language class. There are no class 2 languages in our work. GPT-4o had the worst performance, mostly led by its considerable disparity in language availability classes 3 and 4.
  • ...and 8 more figures