Table of Contents
Fetching ...

Training-Induced Bias Toward LLM-Generated Content in Dense Retrieval

William Xion, Wolfgang Nejdl

TL;DR

Dense retrievers exhibit source bias toward LLM-generated content, but this bias is not inherent; it emerges and shifts with training. The authors evaluate unsupervised checkpoints, MS MARCO fine-tuning, in-domain fine-tuning, and LLM-generated corpus fine-tuning across SciFact and NQ320K, using Relative $\Delta$ to quantify preferences. They challenge the perplexity-based explanation by introducing a retriever-centric Perplexity-Relevance Agreement and showing results near chance across stages. Key findings show MS MARCO fine-tuning consistently induces pro-LLM bias, in-domain fine-tuning yields dataset-specific effects, and LLM-generated fine-tuning reinforces pro-LLM bias, highlighting training as the driver of source bias. Overall, source bias is a training-induced phenomenon rather than a static property of dense retrievers.

Abstract

Dense retrieval is a promising approach for acquiring relevant context or world knowledge in open-domain natural language processing tasks and is now widely used in information retrieval applications. However, recent reports claim a broad preference for text generated by large language models (LLMs). This bias is called "source bias", and it has been hypothesized that lower perplexity contributes to this effect. In this study, we revisit this claim by conducting a controlled evaluation to trace the emergence of such preferences across training stages and data sources. Using parallel human- and LLM-generated counterparts of the SciFact and Natural Questions (NQ320K) datasets, we compare unsupervised checkpoints with models fine-tuned using in-domain human text, in-domain LLM-generated text, and MS MARCO. Our results show the following: 1) Unsupervised retrievers do not exhibit a uniform pro-LLM preference. The direction and magnitude depend on the dataset. 2) Across the settings tested, supervised fine-tuning on MS MARCO consistently shifts the rankings toward LLM-generated text. 3) In-domain fine-tuning produces dataset-specific and inconsistent shifts in preference. 4) Fine-tuning on LLM-generated corpora induces a pronounced pro-LLM bias. Finally, a retriever-centric perplexity probe involving the reattachment of a language modeling head to the fine-tuned dense retriever encoder indicates agreement with relevance near chance, thereby weakening the explanatory power of perplexity. Our study demonstrates that source bias is a training-induced phenomenon rather than an inherent property of dense retrievers.

Training-Induced Bias Toward LLM-Generated Content in Dense Retrieval

TL;DR

Dense retrievers exhibit source bias toward LLM-generated content, but this bias is not inherent; it emerges and shifts with training. The authors evaluate unsupervised checkpoints, MS MARCO fine-tuning, in-domain fine-tuning, and LLM-generated corpus fine-tuning across SciFact and NQ320K, using Relative to quantify preferences. They challenge the perplexity-based explanation by introducing a retriever-centric Perplexity-Relevance Agreement and showing results near chance across stages. Key findings show MS MARCO fine-tuning consistently induces pro-LLM bias, in-domain fine-tuning yields dataset-specific effects, and LLM-generated fine-tuning reinforces pro-LLM bias, highlighting training as the driver of source bias. Overall, source bias is a training-induced phenomenon rather than a static property of dense retrievers.

Abstract

Dense retrieval is a promising approach for acquiring relevant context or world knowledge in open-domain natural language processing tasks and is now widely used in information retrieval applications. However, recent reports claim a broad preference for text generated by large language models (LLMs). This bias is called "source bias", and it has been hypothesized that lower perplexity contributes to this effect. In this study, we revisit this claim by conducting a controlled evaluation to trace the emergence of such preferences across training stages and data sources. Using parallel human- and LLM-generated counterparts of the SciFact and Natural Questions (NQ320K) datasets, we compare unsupervised checkpoints with models fine-tuned using in-domain human text, in-domain LLM-generated text, and MS MARCO. Our results show the following: 1) Unsupervised retrievers do not exhibit a uniform pro-LLM preference. The direction and magnitude depend on the dataset. 2) Across the settings tested, supervised fine-tuning on MS MARCO consistently shifts the rankings toward LLM-generated text. 3) In-domain fine-tuning produces dataset-specific and inconsistent shifts in preference. 4) Fine-tuning on LLM-generated corpora induces a pronounced pro-LLM bias. Finally, a retriever-centric perplexity probe involving the reattachment of a language modeling head to the fine-tuned dense retriever encoder indicates agreement with relevance near chance, thereby weakening the explanatory power of perplexity. Our study demonstrates that source bias is a training-induced phenomenon rather than an inherent property of dense retrievers.
Paper Structure (20 sections, 1 equation, 1 figure, 6 tables)

This paper contains 20 sections, 1 equation, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Perplexity–Relevance Agreement across training stages for E5 and Contriever. The dashed line marks 50% (chance). While prior work hypothesized that lower perplexity passages should align with higher relevance scores, the agreement rates remain close to or below chance for all models and datasets, showing that perplexity even when measured from the retriever’s own encoder fails to account for the observed bias.