Spiral of Silence: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering
Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun
TL;DR
The paper tackles the problem of how LLM-generated content, when continuously indexed by web retrieval systems, can alter Retrieval-Augmented Generation (RAG) performance in Open-Domain Question Answering (ODQA). It introduces an iterative simulation pipeline that ingests AI-generated texts into corpora and evaluates retrieval and QA across multiple backends, languages, and LLMs. Key findings show immediate retrieval gains from AI content but a long-term degradation of retrieval quality, accompanied by a stable QA level and a rising dominance of LLM content in top results, signaling a Spiral of Silence where human content becomes increasingly marginalized. The work underscores risks to information diversity and reliability in AI-assisted IR and motivates interventions to preserve diversity and accuracy in search ecosystems.
Abstract
The practice of Retrieval-Augmented Generation (RAG), which integrates Large Language Models (LLMs) with retrieval systems, has become increasingly prevalent. However, the repercussions of LLM-derived content infiltrating the web and influencing the retrieval-generation feedback loop are largely uncharted territories. In this study, we construct and iteratively run a simulation pipeline to deeply investigate the short-term and long-term effects of LLM text on RAG systems. Taking the trending Open Domain Question Answering (ODQA) task as a point of entry, our findings reveal a potential digital "Spiral of Silence" effect, with LLM-generated text consistently outperforming human-authored content in search rankings, thereby diminishing the presence and impact of human contributions online. This trend risks creating an imbalanced information ecosystem, where the unchecked proliferation of erroneous LLM-generated content may result in the marginalization of accurate information. We urge the academic community to take heed of this potential issue, ensuring a diverse and authentic digital information landscape.
