Table of Contents
Fetching ...

AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping

Sunghwan Kim, Ryang Heo, Yongsik Seo, Jinyoung Yeo, Dongha Lee

TL;DR

AgenticShop introduces the first open-web benchmark for personalized product curation, integrating realistic shopping intents, diverse user profiles, and a checklist-driven evaluation grounded by verifiable evidence extracted from product pages. The framework enables automated LLM-based judging of whether curated outputs satisfy user-specific constraints, providing a scalable measure of personalization across domains. Experimental results show current agentic systems struggle to achieve robust personalization, highlighting challenges such as grounding reliability, dynamic pricing, and subjective aesthetic factors, and outlining directions to enhance exploration, evaluation, and attribution grounding. Overall, AgenticShop offers a rigorous evaluation platform that can drive progress in user-side, personalized web shopping agents and cross-domain product curation.

Abstract

The proliferation of e-commerce has made web shopping platforms key gateways for customers navigating the vast digital marketplace. Yet this rapid expansion has led to a noisy and fragmented information environment, increasing cognitive burden as shoppers explore and purchase products online. With promising potential to alleviate this challenge, agentic systems have garnered growing attention for automating user-side tasks in web shopping. Despite significant advancements, existing benchmarks fail to comprehensively evaluate how well agentic systems can curate products in open-web settings. Specifically, they have limited coverage of shopping scenarios, focusing only on simplified single-platform lookups rather than exploratory search. Moreover, they overlook personalization in evaluation, leaving unclear whether agents can adapt to diverse user preferences in realistic shopping contexts. To address this gap, we present AgenticShop, the first benchmark for evaluating agentic systems on personalized product curation in open-web environment. Crucially, our approach features realistic shopping scenarios, diverse user profiles, and a verifiable, checklist-driven personalization evaluation framework. Through extensive experiments, we demonstrate that current agentic systems remain largely insufficient, emphasizing the need for user-side systems that effectively curate tailored products across the modern web.

AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping

TL;DR

AgenticShop introduces the first open-web benchmark for personalized product curation, integrating realistic shopping intents, diverse user profiles, and a checklist-driven evaluation grounded by verifiable evidence extracted from product pages. The framework enables automated LLM-based judging of whether curated outputs satisfy user-specific constraints, providing a scalable measure of personalization across domains. Experimental results show current agentic systems struggle to achieve robust personalization, highlighting challenges such as grounding reliability, dynamic pricing, and subjective aesthetic factors, and outlining directions to enhance exploration, evaluation, and attribution grounding. Overall, AgenticShop offers a rigorous evaluation platform that can drive progress in user-side, personalized web shopping agents and cross-domain product curation.

Abstract

The proliferation of e-commerce has made web shopping platforms key gateways for customers navigating the vast digital marketplace. Yet this rapid expansion has led to a noisy and fragmented information environment, increasing cognitive burden as shoppers explore and purchase products online. With promising potential to alleviate this challenge, agentic systems have garnered growing attention for automating user-side tasks in web shopping. Despite significant advancements, existing benchmarks fail to comprehensively evaluate how well agentic systems can curate products in open-web settings. Specifically, they have limited coverage of shopping scenarios, focusing only on simplified single-platform lookups rather than exploratory search. Moreover, they overlook personalization in evaluation, leaving unclear whether agents can adapt to diverse user preferences in realistic shopping contexts. To address this gap, we present AgenticShop, the first benchmark for evaluating agentic systems on personalized product curation in open-web environment. Crucially, our approach features realistic shopping scenarios, diverse user profiles, and a verifiable, checklist-driven personalization evaluation framework. Through extensive experiments, we demonstrate that current agentic systems remain largely insufficient, emphasizing the need for user-side systems that effectively curate tailored products across the modern web.
Paper Structure (30 sections, 3 equations, 7 figures, 9 tables)

This paper contains 30 sections, 3 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Conventional web shopping overwhelms users with fragmented information and fatigue (Upper), motivating a context-aware agent that curates personalized options from cross-platform evidence to ease the cognitive burden (Lower).
  • Figure 2: Overview of AgenticShop. For user profile construction, real user purchase histories and review texts are used to build narrative-style user shopping contexts, from which intent-specific queries and personalized checklists are generated. For checklist-driven personalized evaluation, LLM-as-a-judge verifies whether curated results satisfy each user’s shopping context, grounding its decisions in product information extracted from the linked pages of curated products provided by the agents.
  • Figure 3: Performance across product domains.
  • Figure 4: Performance across the six checklist dimensions.
  • Figure 5: Intent distribution of agentic systems.
  • ...and 2 more figures