Table of Contents
Fetching ...

SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation

Nicolas Bougie, Narimasa Watanabe

TL;DR

The paper tackles the persistent gap between offline metrics and real user behavior in recommender systems by introducing SimUSER, a scalable, cost-effective approach that uses LLM-based agents as believable human proxies. It presents a two-phase framework comprising persona consistency-based matching and persona-driven interaction with a retrieval-augmented RS, supported by memory and perception modules. Key contributions include a memory-graph memory, PathSim-based retrieval, multimodal perception via thumbnails, and multi-round causal action refinement, all validated across MovieLens, AmazonBook, and Steam to show closer alignment to human behavior and improved offline-online metric correlation. The approach offers a practical, extensible pathway to bridge offline evaluations and real-world engagement, enabling better RS development and parameter tuning without extensive online testing.

Abstract

Recommender systems play a central role in numerous real-life applications, yet evaluating their performance remains a significant challenge due to the gap between offline metrics and online behaviors. Given the scarcity and limits (e.g., privacy issues) of real user data, we introduce SimUSER, an agent framework that serves as believable and cost-effective human proxies. SimUSER first identifies self-consistent personas from historical data, enriching user profiles with unique backgrounds and personalities. Then, central to this evaluation are users equipped with persona, memory, perception, and brain modules, engaging in interactions with the recommender system. SimUSER exhibits closer alignment with genuine humans than prior work, both at micro and macro levels. Additionally, we conduct insightful experiments to explore the effects of thumbnails on click rates, the exposure effect, and the impact of reviews on user engagement. Finally, we refine recommender system parameters based on offline A/B test results, resulting in improved user engagement in the real world.

SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation

TL;DR

The paper tackles the persistent gap between offline metrics and real user behavior in recommender systems by introducing SimUSER, a scalable, cost-effective approach that uses LLM-based agents as believable human proxies. It presents a two-phase framework comprising persona consistency-based matching and persona-driven interaction with a retrieval-augmented RS, supported by memory and perception modules. Key contributions include a memory-graph memory, PathSim-based retrieval, multimodal perception via thumbnails, and multi-round causal action refinement, all validated across MovieLens, AmazonBook, and Steam to show closer alignment to human behavior and improved offline-online metric correlation. The approach offers a practical, extensible pathway to bridge offline evaluations and real-world engagement, enabling better RS development and parameter tuning without extensive online testing.

Abstract

Recommender systems play a central role in numerous real-life applications, yet evaluating their performance remains a significant challenge due to the gap between offline metrics and online behaviors. Given the scarcity and limits (e.g., privacy issues) of real user data, we introduce SimUSER, an agent framework that serves as believable and cost-effective human proxies. SimUSER first identifies self-consistent personas from historical data, enriching user profiles with unique backgrounds and personalities. Then, central to this evaluation are users equipped with persona, memory, perception, and brain modules, engaging in interactions with the recommender system. SimUSER exhibits closer alignment with genuine humans than prior work, both at micro and macro levels. Additionally, we conduct insightful experiments to explore the effects of thumbnails on click rates, the exposure effect, and the impact of reviews on user engagement. Finally, we refine recommender system parameters based on offline A/B test results, resulting in improved user engagement in the real world.

Paper Structure

This paper contains 47 sections, 3 equations, 13 figures, 11 tables, 1 algorithm.

Figures (13)

  • Figure 1: Spearman correlation between estimated and actual engagement metrics. Higher values indicate better alignment with ground truth metrics.
  • Figure 2: The SimUSER framework for evaluating a movie recommender system.
  • Figure 3: Comparison of rating distributions between ground-truth and human proxies.
  • Figure 4: Ratings vs feelings on IMDB dataset. Comparison between human (top left) and LLM-empowered agents.
  • Figure 5: Preference coherence (accept/reject task). 'I' stands for incoherent; 'C' stands for coherent (Reddit dataset).
  • ...and 8 more figures