Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness

Anindya Bijoy Das; Shahnewaz Karim Sakib

Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness

Anindya Bijoy Das, Shahnewaz Karim Sakib

TL;DR

This work analyzes bias in large language model–driven recommendations for music, movies, and books, identifying demographic, cultural, and contextual biases that influence genre distributions across diverse groups. It establishes a formal framework using CLG and CBG prompts, and evaluates bias with $SPD$, $EOD$, $DI$, and $JSD$ across GPT, LLaMA, and Gemini, revealing pervasive disparities. The authors propose two mitigation strategies—fairness-aware prompt engineering and retrieval-augmented generation (RAG)—and demonstrate substantial bias reductions in numerical experiments, validating the effectiveness of contextual grounding and external data in promoting fairness. The findings underscore the importance of fairness-aware design in cross-cultural, demographic-sensitive recommendations and provide a practical pathway to more equitable LLM-based systems, with implications for deployment in diverse user populations.

Abstract

Large Language Model (LLM)-based recommendation systems excel in delivering comprehensive suggestions by deeply analyzing content and user behavior. However, they often inherit biases from skewed training data, favoring mainstream content while underrepresenting diverse or non-traditional options. This study explores the interplay between bias and LLM-based recommendation systems, focusing on music, song, and book recommendations across diverse demographic and cultural groups. This paper analyzes bias in LLM-based recommendation systems across multiple models (GPT, LLaMA, and Gemini), revealing its deep and pervasive impact on outcomes. Intersecting identities and contextual factors, like socioeconomic status, further amplify biases, complicating fair recommendations across diverse groups. Our findings reveal that bias in these systems is deeply ingrained, yet even simple interventions like prompt engineering can significantly reduce it. We further propose a retrieval-augmented generation strategy to mitigate bias more effectively. Numerical experiments validate these strategies, demonstrating both the pervasive nature of bias and the impact of the proposed solutions.

Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness

TL;DR

, and

across GPT, LLaMA, and Gemini, revealing pervasive disparities. The authors propose two mitigation strategies—fairness-aware prompt engineering and retrieval-augmented generation (RAG)—and demonstrate substantial bias reductions in numerical experiments, validating the effectiveness of contextual grounding and external data in promoting fairness. The findings underscore the importance of fairness-aware design in cross-cultural, demographic-sensitive recommendations and provide a practical pathway to more equitable LLM-based systems, with implications for deployment in diverse user populations.

Abstract

Paper Structure (27 sections, 1 equation, 15 figures, 7 tables)

This paper contains 27 sections, 1 equation, 15 figures, 7 tables.

Introduction
Framework and Research Contributions
Related Works
Problem Formulation
Summary of Contributions
Data Acquisition and Synthesis
Prompt Design
Context-Less Generation (CLG)
Context-Based Generation (CBG)
Methodology for Genre Classification
Comparison among Recommendations
Bias in LLM Recommendations
Context-less generation (CLG)
Context-based generation (CBG)
Comparison among different LLMs
...and 12 more sections

Figures (15)

Figure 1: Applications of LLMs within Big Data and Data Science.
Figure 2: Genre distribution for the recommended 25 movies for Ashley, a 40-year-old female chef (top), and Thomas, a 50-year-old male writer (bottom).
Figure 3: JSD between LLaMA-recommended (a) book genres between a 20 year-male entrepreneur and a 30 year-female musician, (b) book genres between a 50 year-female student and a 30 year-male chef, (c) movie genres between a 50 year-female artist and a 20 year-male artist, and (d) movie genres between a 20 year-male athlete and a 40 year-female comedian, (e) song genres between a 50 year-female chef and a 60 year-male actor, and (f) song genres between a 50 year-male writer and a 30 year-female writer.
Figure 4: Demographic Bias in the LLM-based recommendation system (for movies, songs and books) within CLG
Figure 5: Cultural bias in LLM-based recommendations
...and 10 more figures

Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness

TL;DR

Abstract

Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness

Authors

TL;DR

Abstract

Table of Contents

Figures (15)