Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era
Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, Jun Xu
TL;DR
This survey reframes bias and unfairness in information retrieval under the LLM era as distribution-mismatch problems, unifying definitions and organizing mitigations into data sampling and distribution reconstruction. It catalogs fifteen bias and unfairness phenomena across data collection, model development, and result evaluation, linking each to concrete strategies such as data augmentation, reweighting, prompting, and RLHF. The paper provides a structured, theory-informed roadmap for researchers and practitioners to diagnose and mitigate biases in LLM-enhanced IR systems, and highlights open challenges including feedback loops, unified mitigation, theoretical guarantees, and real-world benchmarks. By maintaining a public resource and proposing a cohesive framework, it aims to accelerate progress toward fairer, more reliable IR in the LLM era.
Abstract
With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey.
