Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?
Haozhe An, Christabel Acquaye, Colin Wang, Zongxia Li, Rachel Rudinger
TL;DR
This study investigates whether large language models exhibit name-based discrimination in hiring decisions when asked to draft outcome emails to applicants. Using 820 templated prompts across 41 occupations and 300 first names spanning three racial/ethnic groups and two genders, the authors generate up to 756,000 emails per model and label outcomes with a high-accuracy SVM. Across five models, results show small but statistically significant biases, with White male names often favored and Hispanic male names consistently disadvantaged, though effects are sensitive to prompts and occupation. The findings raise concerns about fairness in AI-assisted hiring and highlight the need for broader, more representative auditing of LLMs before deployment in decision-making processes.
Abstract
We examine whether large language models (LLMs) exhibit race- and gender-based name discrimination in hiring decisions, similar to classic findings in the social sciences (Bertrand and Mullainathan, 2004). We design a series of templatic prompts to LLMs to write an email to a named job applicant informing them of a hiring decision. By manipulating the applicant's first name, we measure the effect of perceived race, ethnicity, and gender on the probability that the LLM generates an acceptance or rejection email. We find that the hiring decisions of LLMs in many settings are more likely to favor White applicants over Hispanic applicants. In aggregate, the groups with the highest and lowest acceptance rates respectively are masculine White names and masculine Hispanic names. However, the comparative acceptance rates by group vary under different templatic settings, suggesting that LLMs' race- and gender-sensitivity may be idiosyncratic and prompt-sensitive.
