Gender Bias in Large Language Models across Multiple Languages

Jinman Zhao; Yitian Ding; Chen Jia; Yining Wang; Zifan Qian

Gender Bias in Large Language Models across Multiple Languages

Jinman Zhao, Yitian Ding, Chen Jia, Yining Wang, Zifan Qian

TL;DR

This work investigates the outputs of the GPT series of LLMs in various languages using three measurement methods and reveals significant gender biases across all the languages the authors examined.

Abstract

With the growing deployment of large language models (LLMs) across various applications, assessing the influence of gender biases embedded in LLMs becomes crucial. The topic of gender bias within the realm of natural language processing (NLP) has gained considerable focus, particularly in the context of English. Nonetheless, the investigation of gender bias in languages other than English is still relatively under-explored and insufficiently analyzed. In this work, We examine gender bias in LLMs-generated outputs for different languages. We use three measurements: 1) gender bias in selecting descriptive words given the gender-related context. 2) gender bias in selecting gender-related pronouns (she/he) given the descriptive words. 3) gender bias in the topics of LLM-generated dialogues. We investigate the outputs of the GPT series of LLMs in various languages using our three measurement methods. Our findings revealed significant gender biases across all the languages we examined.

Gender Bias in Large Language Models across Multiple Languages

TL;DR

This work investigates the outputs of the GPT series of LLMs in various languages using three measurement methods and reveals significant gender biases across all the languages the authors examined.

Abstract

Paper Structure (39 sections, 3 equations, 7 figures, 27 tables)

This paper contains 39 sections, 3 equations, 7 figures, 27 tables.

Introduction
Related Work
Fairness Measurements
Gender Bias in Language Models
Gender Bias in Multiple Languages
Method
Bias in Descriptive Word Selection
Evaluation.
Bias in Gendered Role Selection
Evaluation.
Bias in Dialogue Topics
Experiments
Experimental Setup
Language selection.
Model selection.
...and 24 more sections

Figures (7)

Figure 1: Bias in descriptive word selection for multiple languages based on GPT-3. Omit outlook because the model generates too few for some languages.
Figure 2: Bias in descriptive word selection for multiple languages based on ChatGPT. Set upper bound to 2.
Figure 3: Bias in descriptive word selection for multiple languages based on GPT-4. Set upper bound to 2.
Figure 4: Bias in gendered role selection for multiple languages based on ChatGPT. Set upper bound to 2.
Figure 5: Bias in Dialogues based on GPT-4.
...and 2 more figures

Gender Bias in Large Language Models across Multiple Languages

TL;DR

Abstract

Gender Bias in Large Language Models across Multiple Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (7)