Table of Contents
Fetching ...

Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era

Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, Jun Xu

TL;DR

This survey reframes bias and unfairness in information retrieval under the LLM era as distribution-mismatch problems, unifying definitions and organizing mitigations into data sampling and distribution reconstruction. It catalogs fifteen bias and unfairness phenomena across data collection, model development, and result evaluation, linking each to concrete strategies such as data augmentation, reweighting, prompting, and RLHF. The paper provides a structured, theory-informed roadmap for researchers and practitioners to diagnose and mitigate biases in LLM-enhanced IR systems, and highlights open challenges including feedback loops, unified mitigation, theoretical guarantees, and real-world benchmarks. By maintaining a public resource and proposing a cohesive framework, it aims to accelerate progress toward fairer, more reliable IR in the LLM era.

Abstract

With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey.

Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era

TL;DR

This survey reframes bias and unfairness in information retrieval under the LLM era as distribution-mismatch problems, unifying definitions and organizing mitigations into data sampling and distribution reconstruction. It catalogs fifteen bias and unfairness phenomena across data collection, model development, and result evaluation, linking each to concrete strategies such as data augmentation, reweighting, prompting, and RLHF. The paper provides a structured, theory-informed roadmap for researchers and practitioners to diagnose and mitigate biases in LLM-enhanced IR systems, and highlights open challenges including feedback loops, unified mitigation, theoretical guarantees, and real-world benchmarks. By maintaining a public resource and proposing a cohesive framework, it aims to accelerate progress toward fairer, more reliable IR in the LLM era.

Abstract

With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey.
Paper Structure (30 sections, 1 equation, 2 figures, 2 tables)

This paper contains 30 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of three stages of the intersection between LLMs and IR systems. (a) LLMs-generated content as new data sources for IR. (b) Incorporating LLMs to enhance or as IR models. (c) Adopting LLMs as results evaluators in IR systems.
  • Figure 2: Illustration of different types of mitigation strategies from a unified view of distribution alignment.