Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

Zijian Yu; Kejun Xiao; Huaipeng Zhao; Tao Luo; Xiaoyi Zeng

Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

Zijian Yu, Kejun Xiao, Huaipeng Zhao, Tao Luo, Xiaoyi Zeng

Abstract

In e-commerce, LLM agents show promise for shopping tasks such as recommendations, budgeting, and bundle deals, where accurately capturing user preferences from long-term conversations is critical. However, two challenges hinder realizing this potential: (1) the absence of benchmarks for evaluating long-term preference-aware shopping tasks, and (2) the lack of end-to-end optimization due to existing designs that treat preference identification and shopping assistance as separate components. In this paper, we introduce a novel benchmark with a long-term memory setup, spanning two shopping tasks over 1.2 million real-world products, and propose Shopping Companion, a unified framework that jointly tackles memory retrieval and shopping assistance while supporting user intervention. To train such capabilities, we develop a dual-reward reinforcement learning strategy with tool-wise rewards to handle the sparse and discontinuous rewards inherent in multi-turn interactions. Experimental results demonstrate that even state-of-the-art models (such as GPT-5) achieve success rates under 70% on our benchmark, highlighting the significant challenges in this domain. Notably, our lightweight LLM, trained with Shopping Companion, consistently outperforms strong baselines, achieving better preference capture and task performance, which validates the effectiveness of our unified design.

Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

Abstract

Paper Structure (51 sections, 11 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 51 sections, 11 equations, 12 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Long-Term Memory.
Shopping Agent.
Problem Formulation
Benchmark Construction
Shopping Simulation Environment
Evaluation Methods
Shopping Companion
Two-Stage Agentic Framework
Reward Function Design: Dual-Reward with Tool-Wise Supervision
Dual-reward setup.
Stage-1 reward
Stage-2 reward
Tool-wise reward $R_{\mathrm{tool}}(\tau)$.
...and 36 more sections

Figures (12)

Figure 1: An example of the Shopping Companion framework. Given a user instruction, the agent first performs Preference Identification (Stage 1) by invoking memory tools to retrieve relevant conversation history and extract preferences. These are presented to the user for confirmation, giving the user the ability to intervene. Then, in Shopping Assistance (Stage 2), the agent leverages the confirmed preferences to iteratively search for products and check constraints until the task is completed.
Figure 2: Tool-wise reward effects: (a) tool-wise score and (b) response length over training steps (solid=smoothed, translucent=raw).
Figure 3: Distribution of Product Category (Top 20)
Figure 6: Dialogue Generation Prompt
Figure 7: User Instruction Generation Prompt of Single Product Rec
...and 7 more figures

Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

Abstract

Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

Authors

Abstract

Table of Contents

Figures (12)