Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness
Anindya Bijoy Das, Shahnewaz Karim Sakib
TL;DR
This work analyzes bias in large language model–driven recommendations for music, movies, and books, identifying demographic, cultural, and contextual biases that influence genre distributions across diverse groups. It establishes a formal framework using CLG and CBG prompts, and evaluates bias with $SPD$, $EOD$, $DI$, and $JSD$ across GPT, LLaMA, and Gemini, revealing pervasive disparities. The authors propose two mitigation strategies—fairness-aware prompt engineering and retrieval-augmented generation (RAG)—and demonstrate substantial bias reductions in numerical experiments, validating the effectiveness of contextual grounding and external data in promoting fairness. The findings underscore the importance of fairness-aware design in cross-cultural, demographic-sensitive recommendations and provide a practical pathway to more equitable LLM-based systems, with implications for deployment in diverse user populations.
Abstract
Large Language Model (LLM)-based recommendation systems excel in delivering comprehensive suggestions by deeply analyzing content and user behavior. However, they often inherit biases from skewed training data, favoring mainstream content while underrepresenting diverse or non-traditional options. This study explores the interplay between bias and LLM-based recommendation systems, focusing on music, song, and book recommendations across diverse demographic and cultural groups. This paper analyzes bias in LLM-based recommendation systems across multiple models (GPT, LLaMA, and Gemini), revealing its deep and pervasive impact on outcomes. Intersecting identities and contextual factors, like socioeconomic status, further amplify biases, complicating fair recommendations across diverse groups. Our findings reveal that bias in these systems is deeply ingrained, yet even simple interventions like prompt engineering can significantly reduce it. We further propose a retrieval-augmented generation strategy to mitigate bias more effectively. Numerical experiments validate these strategies, demonstrating both the pervasive nature of bias and the impact of the proposed solutions.
