Table of Contents
Fetching ...

Revealing Hidden Bias in AI: Lessons from Large Language Models

Django Beatty, Kritsada Masanthia, Teepakorn Kaphol, Niphan Sethi

TL;DR

Findings indicate that while anonymization reduces certain biases, particularly gender bias, the degree of effectiveness varies across models and bias types, and suggests best practices for minimizing bias in AI applications, promoting fairness and inclusivity.

Abstract

As large language models (LLMs) become integral to recruitment processes, concerns about AI-induced bias have intensified. This study examines biases in candidate interview reports generated by Claude 3.5 Sonnet, GPT-4o, Gemini 1.5, and Llama 3.1 405B, focusing on characteristics such as gender, race, and age. We evaluate the effectiveness of LLM-based anonymization in reducing these biases. Findings indicate that while anonymization reduces certain biases, particularly gender bias, the degree of effectiveness varies across models and bias types. Notably, Llama 3.1 405B exhibited the lowest overall bias. Moreover, our methodology of comparing anonymized and non-anonymized data reveals a novel approach to assessing inherent biases in LLMs beyond recruitment applications. This study underscores the importance of careful LLM selection and suggests best practices for minimizing bias in AI applications, promoting fairness and inclusivity.

Revealing Hidden Bias in AI: Lessons from Large Language Models

TL;DR

Findings indicate that while anonymization reduces certain biases, particularly gender bias, the degree of effectiveness varies across models and bias types, and suggests best practices for minimizing bias in AI applications, promoting fairness and inclusivity.

Abstract

As large language models (LLMs) become integral to recruitment processes, concerns about AI-induced bias have intensified. This study examines biases in candidate interview reports generated by Claude 3.5 Sonnet, GPT-4o, Gemini 1.5, and Llama 3.1 405B, focusing on characteristics such as gender, race, and age. We evaluate the effectiveness of LLM-based anonymization in reducing these biases. Findings indicate that while anonymization reduces certain biases, particularly gender bias, the degree of effectiveness varies across models and bias types. Notably, Llama 3.1 405B exhibited the lowest overall bias. Moreover, our methodology of comparing anonymized and non-anonymized data reveals a novel approach to assessing inherent biases in LLMs beyond recruitment applications. This study underscores the importance of careful LLM selection and suggests best practices for minimizing bias in AI applications, promoting fairness and inclusivity.

Paper Structure

This paper contains 65 sections, 18 figures, 3 tables.

Figures (18)

  • Figure 1: High-level architecture for generating CV analysis
  • Figure 2: High-level architecture for the process of conducting LLM bias research
  • Figure 3: Example of a generated interview question report
  • Figure 4: CVs classification Approach 1: document cluster (t-SNE)
  • Figure 5: CVs classification Approach 2: Number of CV in each categories
  • ...and 13 more figures