Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts
Shweta Parihar, Lu Cheng
TL;DR
This work systematically evaluates social bias in Retrieval-Augmented Generation (RAG) across more than 13 bias types using two retrieval corpora (WikiText-103 and C4) and multiple bias benchmarks (SCW, BOLD, HolisticBias). It shows that standard RAG generally reduces bias by diversifying contextual grounding, while introducing Chain-of-Thought (CoT) prompting within RAG paradoxically increases bias, revealing a fairness-accuracy tradeoff. A faithfulness analysis indicates CoT explanations are largely grounded in retrieved evidence, yet more explicit reasoning can amplify biased associations. The findings highlight the need for bias-aware reasoning frameworks and carefully curated external context to harness RAG's fairness benefits without amplifying bias in the reasoning process.
Abstract
Social biases inherent in large language models (LLMs) raise significant fairness concerns. Retrieval-Augmented Generation (RAG) architectures, which retrieve external knowledge sources to enhance the generative capabilities of LLMs, remain susceptible to the same bias-related challenges. This work focuses on evaluating and understanding the social bias implications of RAG. Through extensive experiments across various retrieval corpora, LLMs, and bias evaluation datasets, encompassing more than 13 different bias types, we surprisingly observe a reduction in bias in RAG. This suggests that the inclusion of external context can help counteract stereotype-driven predictions, potentially improving fairness by diversifying the contextual grounding of the model's outputs. To better understand this phenomenon, we then explore the model's reasoning process by integrating Chain-of-Thought (CoT) prompting into RAG while assessing the faithfulness of the model's CoT. Our experiments reveal that the model's bias inclinations shift between stereotype and anti-stereotype responses as more contextual information is incorporated from the retrieved documents. Interestingly, we find that while CoT enhances accuracy, contrary to the bias reduction observed with RAG, it increases overall bias across datasets, highlighting the need for bias-aware reasoning frameworks that can mitigate this trade-off.
