Table of Contents
Fetching ...

PerSHOP -- A Persian dataset for shopping dialogue systems modeling

Keyvan Mahmoudi, Heshaam Faili

TL;DR

PerSHOP presents the first open Persian shopping dialogue dataset collected via crowd-sourcing, addressing the lack of Persian-domain data for task-oriented systems. It combines a two-stage data collection and annotation workflow with baseline NLU models (DIETClassifier, ParsBERT, LaBSE, and CRF) to establish a benchmark for intent classification and entity extraction. The dataset encompasses about 22k utterances across 15 domains and 1,061 dialogues, with a rich ontology of 750 products and 36 feature slots, enabling realistic shopping interactions. The work lays groundwork for scaling Persian shopping dialogue research through data expansion, paraphrasing, translation from high-resource languages, and development of end-to-end conversational models with practical impact for Persian-speaking users.

Abstract

Nowadays, dialogue systems are used in many fields of industry and research. There are successful instances of these systems, such as Apple Siri, Google Assistant, and IBM Watson. Task-oriented dialogue system is a category of these, that are used in specific tasks. They can perform tasks such as booking plane tickets or making restaurant reservations. Shopping is one of the most popular areas on these systems. The bot replaces the human salesperson and interacts with the customers by speaking. To train the models behind the scenes of these systems, annotated data is needed. In this paper, we developed a dataset of dialogues in the Persian language through crowd-sourcing. We annotated these dialogues to train a model. This dataset contains nearly 22k utterances in 15 different domains and 1061 dialogues. This is the largest Persian dataset in this field, which is provided freely so that future researchers can use it. Also, we proposed some baseline models for natural language understanding (NLU) tasks. These models perform two tasks for NLU: intent classification and entity extraction. The F-1 score metric obtained for intent classification is around 91% and for entity extraction is around 93%, which can be a baseline for future research.

PerSHOP -- A Persian dataset for shopping dialogue systems modeling

TL;DR

PerSHOP presents the first open Persian shopping dialogue dataset collected via crowd-sourcing, addressing the lack of Persian-domain data for task-oriented systems. It combines a two-stage data collection and annotation workflow with baseline NLU models (DIETClassifier, ParsBERT, LaBSE, and CRF) to establish a benchmark for intent classification and entity extraction. The dataset encompasses about 22k utterances across 15 domains and 1,061 dialogues, with a rich ontology of 750 products and 36 feature slots, enabling realistic shopping interactions. The work lays groundwork for scaling Persian shopping dialogue research through data expansion, paraphrasing, translation from high-resource languages, and development of end-to-end conversational models with practical impact for Persian-speaking users.

Abstract

Nowadays, dialogue systems are used in many fields of industry and research. There are successful instances of these systems, such as Apple Siri, Google Assistant, and IBM Watson. Task-oriented dialogue system is a category of these, that are used in specific tasks. They can perform tasks such as booking plane tickets or making restaurant reservations. Shopping is one of the most popular areas on these systems. The bot replaces the human salesperson and interacts with the customers by speaking. To train the models behind the scenes of these systems, annotated data is needed. In this paper, we developed a dataset of dialogues in the Persian language through crowd-sourcing. We annotated these dialogues to train a model. This dataset contains nearly 22k utterances in 15 different domains and 1061 dialogues. This is the largest Persian dataset in this field, which is provided freely so that future researchers can use it. Also, we proposed some baseline models for natural language understanding (NLU) tasks. These models perform two tasks for NLU: intent classification and entity extraction. The F-1 score metric obtained for intent classification is around 91% and for entity extraction is around 93%, which can be a baseline for future research.
Paper Structure (14 sections, 8 figures, 6 tables)

This paper contains 14 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Distribution of utterances in each domain
  • Figure 2: An example of the annotation done in to specify the entities of a user's utterance.
  • Figure 3: A very simple scenario in this dataset
  • Figure 4: Distribution of turns in dialogues
  • Figure 5: Distribution of tokens in utterances
  • ...and 3 more figures