AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms
Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, Yong Li
TL;DR
The paper presents the AgentSociety Challenge, a two-track benchmark for LLM agents in user modeling and recommendation on web platforms, built on Yelp, Amazon, and Goodreads data with an InteractionTool-based environment. It details a two-phase evaluation with simulated and real groundtruth, and defines metrics including MAE for preference estimation, a composite Review Generation Error, and HR@N for ranking. Competitive results show significant improvements over baselines and highlight design patterns such as memory-augmented prompting and platform-specific feature engineering. The study validates the mixed groundtruth approach as a reliable predictor of real-world performance and provides an open-source benchmarking suite to catalyze future work in user-centric IR and recommender systems. The work underscores the potential of LLM agents to bridge user modeling and recommendation in dynamic web platforms.
Abstract
The AgentSociety Challenge is the first competition in the Web Conference that aims to explore the potential of Large Language Model (LLM) agents in modeling user behavior and enhancing recommender systems on web platforms. The Challenge consists of two tracks: the User Modeling Track and the Recommendation Track. Participants are tasked to utilize a combined dataset from Yelp, Amazon, and Goodreads, along with an interactive environment simulator, to develop innovative LLM agents. The Challenge has attracted 295 teams across the globe and received over 1,400 submissions in total over the course of 37 official competition days. The participants have achieved 21.9% and 20.3% performance improvement for Track 1 and Track 2 in the Development Phase, and 9.1% and 15.9% in the Final Phase, representing a significant accomplishment. This paper discusses the detailed designs of the Challenge, analyzes the outcomes, and highlights the most successful LLM agent designs. To support further research and development, we have open-sourced the benchmark environment at https://tsinghua-fib-lab.github.io/AgentSocietyChallenge.
