Large Language Models Empowered Personalized Web Agents
Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua
TL;DR
The paper tackles the lack of personalization in LLM-driven Web agents by introducing PersonalWAB, a benchmark for three personalized Web tasks, and proposing PUMA, a memory-augmented alignment framework. PUMA leverages a long-term user memory, task-specific retrieval, heuristic fine-tuning, and Direct Preference Optimization to generate personalized Web function parameters. Empirical results on PersonalWAB show that PUMA consistently outperforms existing Web agents in both single-turn and multi-turn settings, while delivering substantial efficiency gains. The work advances personalized Web automation and provides a scalable, evaluable blueprint for integrating user data into LLM-based agents, with attention to ethical and privacy considerations. Overall, the combination of task formulation, a dedicated benchmark, and a memory-driven alignment framework offers a principled path toward more intelligent, user-centered Web agents in shopping and beyond.
Abstract
Web agents have emerged as a promising direction to automate Web task completion based on user instructions, significantly enhancing user experience. Recently, Web agents have evolved from traditional agents to Large Language Models (LLMs)-based Web agents. Despite their success, existing LLM-based Web agents overlook the importance of personalized data (e.g., user profiles and historical Web behaviors) in assisting the understanding of users' personalized instructions and executing customized actions. To overcome the limitation, we first formulate the task of LLM-empowered personalized Web agents, which integrate personalized data and user instructions to personalize instruction comprehension and action execution. To address the absence of a comprehensive evaluation benchmark, we construct a Personalized Web Agent Benchmark (PersonalWAB), featuring user instructions, personalized user data, Web functions, and two evaluation paradigms across three personalized Web tasks. Moreover, we propose a Personalized User Memory-enhanced Alignment (PUMA) framework to adapt LLMs to the personalized Web agent task. PUMA utilizes a memory bank with a task-specific retrieval strategy to filter relevant historical Web behaviors. Based on the behaviors, PUMA then aligns LLMs for personalized action execution through fine-tuning and direct preference optimization. Extensive experiments validate the superiority of PUMA over existing Web agents on PersonalWAB.
