Table of Contents
Fetching ...

PSCon: Product Search Through Conversations

Jie Zou, Mohammad Aliannejadi, Evangelos Kanoulas, Shuxi Han, Heli Ma, Zheng Wang, Yang Yang, Heng Tao Shen

TL;DR

This work addresses the lack of real, multilingual CPS data by introducing PSCon, a coached human–human CPS dataset collected across two markets and two languages. It defines a six-task CPS pipeline and provides a Transformer-based baseline that handles user intent detection, keyword extraction, system action prediction, question selection, item ranking, and response generation, aided by a knowledge graph. Pre-training on existing knowledge-grounded dialogue data and fine-tuning on PSCon demonstrate the dataset’s utility for comprehensive CPS research, with strong performance on most subtasks and insights into where large language models excel or struggle. The dataset, model, and analysis offer a practical foundation for advancing cross-market, multilingual CPS systems and invite future work on data scale, cross-lingual transfer, and retrieval-augmented generation integrations.

Abstract

Conversational Product Search ( CPS ) systems interact with users via natural language to offer personalized and context-aware product lists. However, most existing research on CPS is limited to simulated conversations, due to the lack of a real CPS dataset driven by human-like language. Moreover, existing conversational datasets for e-commerce are constructed for a particular market or a particular language and thus can not support cross-market and multi-lingual usage. In this paper, we propose a CPS data collection protocol and create a new CPS dataset, called PSCon, which assists product search through conversations with human-like language. The dataset is collected by a coached human-human data collection protocol and is available for dual markets and two languages. By formulating the task of CPS, the dataset allows for comprehensive and in-depth research on six subtasks: user intent detection, keyword extraction, system action prediction, question selection, item ranking, and response generation. Moreover, we present a concise analysis of the dataset and propose a benchmark model on the proposed CPS dataset. Our proposed dataset and model will be helpful for facilitating future research on CPS.

PSCon: Product Search Through Conversations

TL;DR

This work addresses the lack of real, multilingual CPS data by introducing PSCon, a coached human–human CPS dataset collected across two markets and two languages. It defines a six-task CPS pipeline and provides a Transformer-based baseline that handles user intent detection, keyword extraction, system action prediction, question selection, item ranking, and response generation, aided by a knowledge graph. Pre-training on existing knowledge-grounded dialogue data and fine-tuning on PSCon demonstrate the dataset’s utility for comprehensive CPS research, with strong performance on most subtasks and insights into where large language models excel or struggle. The dataset, model, and analysis offer a practical foundation for advancing cross-market, multilingual CPS systems and invite future work on data scale, cross-lingual transfer, and retrieval-augmented generation integrations.

Abstract

Conversational Product Search ( CPS ) systems interact with users via natural language to offer personalized and context-aware product lists. However, most existing research on CPS is limited to simulated conversations, due to the lack of a real CPS dataset driven by human-like language. Moreover, existing conversational datasets for e-commerce are constructed for a particular market or a particular language and thus can not support cross-market and multi-lingual usage. In this paper, we propose a CPS data collection protocol and create a new CPS dataset, called PSCon, which assists product search through conversations with human-like language. The dataset is collected by a coached human-human data collection protocol and is available for dual markets and two languages. By formulating the task of CPS, the dataset allows for comprehensive and in-depth research on six subtasks: user intent detection, keyword extraction, system action prediction, question selection, item ranking, and response generation. Moreover, we present a concise analysis of the dataset and propose a benchmark model on the proposed CPS dataset. Our proposed dataset and model will be helpful for facilitating future research on CPS.

Paper Structure

This paper contains 34 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Study protocol overview.
  • Figure 2: The interface of the chat room (the system role).
  • Figure 3: A conversation example of our collected dataset.
  • Figure 4: Intent/action distribution in the Chinese dataset.
  • Figure 5: # turns vs. # liked products.
  • ...and 1 more figures