Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching
Xiangci Li, Zhiyu Chen, Jason Ingyu Choi, Nikhita Vedula, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi
TL;DR
This work tackles data scarcity in conversational product search by introducing TRACER, a framework that grounds dialogue generation in product plans via a decision-tree search. It generates Wizard of Shopping (WoS), a large-scale, natural, target-oriented CPS dataset across three domains, validated by human evaluations and downstream experiments. TRACER combines customer preference sampling, tree-guided dialogue planning, and verbalization (interactive or single-pass) to produce coherent conversations and realistic search trajectories that minimize user effort. The WoS dataset demonstrates clear benefits for downstream tasks in Conversational Query Generation and Conversational Product Ranking, and the work provides extensive analysis and comparisons with prior datasets, highlighting its significance for CPS research and application.
Abstract
The goal of conversational product search (CPS) is to develop an intelligent, chat-based shopping assistant that can directly interact with customers to understand shopping intents, ask clarification questions, and find relevant products. However, training such assistants is hindered mainly due to the lack of reliable and large-scale datasets. Prior human-annotated CPS datasets are extremely small in size and lack integration with real-world product search systems. We propose a novel approach, TRACER, which leverages large language models (LLMs) to generate realistic and natural conversations for different shopping domains. TRACER's novelty lies in grounding the generation to dialogue plans, which are product search trajectories predicted from a decision tree model, that guarantees relevant product discovery in the shortest number of search conditions. We also release the first target-oriented CPS dataset Wizard of Shopping (WoS), containing highly natural and coherent conversations (3.6k) from three shopping domains. Finally, we demonstrate the quality and effectiveness of WoS via human evaluations and downstream tasks.
