Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

Zhiwen You; HaeJin Lee; Shubhanshu Mishra; Sullam Jeoung; Apratim Mishra; Jinseok Kim; Jana Diesner

Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

Zhiwen You, HaeJin Lee, Shubhanshu Mishra, Sullam Jeoung, Apratim Mishra, Jinseok Kim, Jana Diesner

TL;DR

This paper addresses the bias in name-based gender prediction by introducing a neutral category to move beyond binary labels and by testing whether birth-year information improves predictions. It conducts a large set of experiments across three SSA datasets (US, Canada, France) with balanced and dynamic-name-label subsets, evaluating both fine-tuned foundation models (BERT, RoBERTa, CharBERT) and several LLMs (GPT-3.5, Llama 2/3, Mixtral, Claude Haiku) under zero- and few-shot prompts. The results show that LLMs reliably predict binary male/female names but perform poorly on gender-neutral ones, with non-English names generally harder; adding birth year often degrades performance for LLMs, though some models like Mixtral may benefit in neutral-name prediction. The study highlights significant limitations and biases in current LLMs for gender inference, underscoring the need for cautious, inclusive labeling and consideration of language- and time-dependent biases in downstream analytics.

Abstract

Name-based gender prediction has traditionally categorized individuals as either female or male based on their names, using a binary classification system. That binary approach can be problematic in the cases of gender-neutral names that do not align with any one gender, among other reasons. Relying solely on binary gender categories without recognizing gender-neutral names can reduce the inclusiveness of gender prediction tasks. We introduce an additional gender category, i.e., "neutral", to study and address potential gender biases in Large Language Models (LLMs). We evaluate the performance of several foundational and large language models in predicting gender based on first names only. Additionally, we investigate the impact of adding birth years to enhance the accuracy of gender prediction, accounting for shifting associations between names and genders over time. Our findings indicate that most LLMs identify male and female names with high accuracy (over 80%) but struggle with gender-neutral names (under 40%), and the accuracy of gender prediction is higher for English-based first names than non-English names. The experimental results show that incorporating the birth year does not improve the overall accuracy of gender prediction, especially for names with evolving gender associations. We recommend using caution when applying LLMs for gender identification in downstream tasks, particularly when dealing with non-binary gender labels.

Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

TL;DR

Abstract

Paper Structure (20 sections, 4 figures, 7 tables)

This paper contains 20 sections, 4 figures, 7 tables.

Introduction
Related Work
Experiments
Data
Gender Prediction Models
Results
Discussion
LLMs are poor at accurately predicting gender.
Including temporal information mostly degrades accuracy.
LLMs have worst performance on gender-neutral names.
LLM performance is biased towards recent year patterns.
Suggestions for practitioners
Conclusion
Bias Statement
Dataset Statistics
...and 5 more sections

Figures (4)

Figure 1: Example of an LLM predicting different gender labels over time for the same first name. "Victory" was labeled Male in 1933, and the LLM predicted it correctly. However, by 2016, the name had become predominantly gender-neutral, but the LLM still incorrectly predicted it as Male.
Figure 2: Temporal-level comparison of 5 LLMs using the US SSA dynamic gender label dataset given the results of Table \ref{['tab:duplicated-performance']}. We report the overall accuracy of gender prediction for each year.
Figure 3: Temporal-level comparison of all LLMs across Canada SSA dynamic gender label dataset given the results of Table \ref{['tab:duplicated-performance']}.
Figure 4: Temporal-level comparison of all LLMs across France SSA dynamic gender label dataset given the results of Table \ref{['tab:duplicated-performance']}.

Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

TL;DR

Abstract

Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)