Table of Contents
Fetching ...

Echoes in the Loop: Diagnosing Risks in LLM-Powered Recommender Systems under Feedback Loops

Donguk Park, Dongwon Lee, Yeon-Chang Lee

TL;DR

This work investigates systemic risks arising when large language models (LLMs) are embedded in recommender systems and interact through closed-loop feedback. It introduces EchoTrace, a role-aware, phase-wise diagnostic framework with a controlled feedback-loop pipeline that measures bias and hallucination across LLM-generated content, ranking decisions, and ecosystem dynamics over time. Phase-wise experiments on MovieLens-1M and Amazon-Books show that LLM-based components can amplify popularity bias, induce hallucination into user/item representations, and drive long-term polarization in embeddings, beyond traditional RS dynamics. The framework is released as an open-source toolkit to enable systematic risk analysis and mitigation across diverse LLM-powered recommender architectures.

Abstract

Large language models (LLMs) are increasingly embedded into recommender systems, where they operate across multiple functional roles such as data augmentation, profiling, and decision making. While prior work emphasizes recommendation performance, the systemic risks of LLMs, such as bias and hallucination, and their propagation through feedback loops remain largely unexplored. In this paper, we propose a role-aware, phase-wise diagnostic framework that traces how these risks emerge, manifest in ranking outcomes, and accumulate over repeated recommendation cycles. We formalize a controlled feedback-loop pipeline that simulates long-term interaction dynamics and enables empirical measurement of risks at the LLM-generated content, ranking, and ecosystem levels. Experiments on widely used benchmarks demonstrate that LLM-based components can amplify popularity bias, introduce spurious signals through hallucination, and lead to polarized and self-reinforcing exposure patterns over time. We plan to release our framework as an open-source toolkit to facilitate systematic risk analysis across diverse LLM-powered recommender systems.

Echoes in the Loop: Diagnosing Risks in LLM-Powered Recommender Systems under Feedback Loops

TL;DR

This work investigates systemic risks arising when large language models (LLMs) are embedded in recommender systems and interact through closed-loop feedback. It introduces EchoTrace, a role-aware, phase-wise diagnostic framework with a controlled feedback-loop pipeline that measures bias and hallucination across LLM-generated content, ranking decisions, and ecosystem dynamics over time. Phase-wise experiments on MovieLens-1M and Amazon-Books show that LLM-based components can amplify popularity bias, induce hallucination into user/item representations, and drive long-term polarization in embeddings, beyond traditional RS dynamics. The framework is released as an open-source toolkit to enable systematic risk analysis and mitigation across diverse LLM-powered recommender architectures.

Abstract

Large language models (LLMs) are increasingly embedded into recommender systems, where they operate across multiple functional roles such as data augmentation, profiling, and decision making. While prior work emphasizes recommendation performance, the systemic risks of LLMs, such as bias and hallucination, and their propagation through feedback loops remain largely unexplored. In this paper, we propose a role-aware, phase-wise diagnostic framework that traces how these risks emerge, manifest in ranking outcomes, and accumulate over repeated recommendation cycles. We formalize a controlled feedback-loop pipeline that simulates long-term interaction dynamics and enables empirical measurement of risks at the LLM-generated content, ranking, and ecosystem levels. Experiments on widely used benchmarks demonstrate that LLM-based components can amplify popularity bias, introduce spurious signals through hallucination, and lead to polarized and self-reinforcing exposure patterns over time. We plan to release our framework as an open-source toolkit to facilitate systematic risk analysis across diverse LLM-powered recommender systems.
Paper Structure (19 sections, 1 equation, 10 figures, 5 tables)

This paper contains 19 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Annual trend of publications on LLM-powered recommender systems in top-tier data mining and information retrieval conferences such as SIGIR, KDD, WWW, CIKM, and ICDM. While this survey focuses on these venues, the overall volume of research is substantially larger when accounting for additional conferences, journals, and preprints.
  • Figure 2: Taxonomy of LLM roles in LLM4RS, showing the distribution of all surveyed studies across role categories.
  • Figure 3: Overview of the proposed diagnostic framework under a controlled feedback-loop pipeline. The framework consists of three phases: (P1) LLM Content Generation, (P2) Recommendation, and (P3) Feedback Loop.
  • Figure 4: Overview of the controlled feedback-loop simulation pipeline. The timeline after the cutoff time $t$ is partitioned into $N$ consecutive periods based on the ground-truth interaction log. In each period, recommendations are generated only for active users identified from the ground truth.
  • Figure 5: Distribution analysis of user profile attributes inferred by LLM-as-Representer for ML-1M and A-Books.
  • ...and 5 more figures