Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education
Hayate Iso, Pouya Pezeshkpour, Nikita Bhutani, Estevam Hruschka
TL;DR
This work assesses fairness in LLM-driven job-resume matching within the U.S. and English context by systematically manipulating candidate demographics (gender, race) and educational background across 40 occupations, 12 models, and a large synthetic variant space. Using a 1–10 matching score and ROC AUC evaluation, it shows that explicit biases against gender and race have diminished in recent models, while implicit biases related to educational prestige persist. The study highlights the need for continuous fairness auditing and advanced mitigation strategies to prevent biased hiring outcomes in real-world HR deployments. The findings have practical implications for industry practitioners seeking equitable AI-powered recruitment and for researchers developing robust bias mitigation techniques.
Abstract
Large Language Models (LLMs) offer the potential to automate hiring by matching job descriptions with candidate resumes, streamlining recruitment processes, and reducing operational costs. However, biases inherent in these models may lead to unfair hiring practices, reinforcing societal prejudices and undermining workplace diversity. This study examines the performance and fairness of LLMs in job-resume matching tasks within the English language and U.S. context. It evaluates how factors such as gender, race, and educational background influence model decisions, providing critical insights into the fairness and reliability of LLMs in HR applications. Our findings indicate that while recent models have reduced biases related to explicit attributes like gender and race, implicit biases concerning educational background remain significant. These results highlight the need for ongoing evaluation and the development of advanced bias mitigation strategies to ensure equitable hiring practices when using LLMs in industry settings.
