BAD: BiAs Detection for Large Language Models in the context of candidate screening

Nam Ho Koh; Joseph Plata; Joyce Chai

BAD: BiAs Detection for Large Language Models in the context of candidate screening

Nam Ho Koh, Joseph Plata, Joyce Chai

TL;DR

This work investigates how large language models may perpetuate or mitigate biases in candidate screening by generating a demographically-informed resume dataset and evaluating biases with a context-association test (CAT). It combines resume-generation experiments with statistical bias analysis (chi-squared tests) and CAT metrics to quantify stereotyping tendencies in outputs from GPT-4 and GPT-3.5-turbo, revealing model-dependent variation in bias and highlighting practical implications for hiring systems. The authors open-source their CAT framework and dataset to promote transparency and further study, calling for careful deployment and broader evaluation across models and domains. Overall, the study provides a multi-angle assessment of LLM-induced bias in screening contexts and a baseline for ongoing fairness audits in HR technology.

Abstract

Application Tracking Systems (ATS) have allowed talent managers, recruiters, and college admissions committees to process large volumes of potential candidate applications efficiently. Traditionally, this screening process was conducted manually, creating major bottlenecks due to the quantity of applications and introducing many instances of human bias. The advent of large language models (LLMs) such as ChatGPT and the potential of adopting methods to current automated application screening raises additional bias and fairness issues that must be addressed. In this project, we wish to identify and quantify the instances of social bias in ChatGPT and other OpenAI LLMs in the context of candidate screening in order to demonstrate how the use of these models could perpetuate existing biases and inequalities in the hiring process.

BAD: BiAs Detection for Large Language Models in the context of candidate screening

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 6 figures, 2 tables)

This paper contains 19 sections, 2 equations, 6 figures, 2 tables.

Introduction
Motivation
Related Works
Approach
Datasets
Resume Generation Dataset Creation
Evaluation
Resume Generation Results
Statistical test
Context Awareness Test (CAT) Results
Discussion
Limitations & Future Works
Conclusion
Work Divison
Appendix
...and 4 more sections

Figures (6)

Figure 1: Breakdown of estimated ethnicity and job area
Figure 2: Breakdown of estimated gender and job area
Figure 3: Relative Representation for Software Engineering
Figure 4: Relative Representation for Marketing
Figure 5: Distribution of Estimated Ethnicity
...and 1 more figures

BAD: BiAs Detection for Large Language Models in the context of candidate screening

TL;DR

Abstract

BAD: BiAs Detection for Large Language Models in the context of candidate screening

Authors

TL;DR

Abstract

Table of Contents

Figures (6)