Table of Contents
Fetching ...

Creating, Using and Assessing a Generative-AI-Based Human-Chatbot-Dialogue Dataset with User-Interaction Learning Capabilities

Alfredo Cuzzocrea, Giovanni Pilato, Pablo Garcia Bringas

TL;DR

The paper addresses the need for emotion-aware dialogue datasets in customer-service contexts by generating synthetic conversations with ChatGPT-3.5 conditioned on target emotions and CEFR language levels ($A2$, $B2$, $C2$). It introduces a pipeline that yields per-turn emotional labels and CEFR annotations, including both Explicit and Implicit Emotion Dialogues, and applies quality control via a Quality of Interaction (QoI) metric alongside ARTE-based readability assessments. A set of experiments demonstrates emotionally coherent turn sequences across anger and surprise at multiple CEFR levels, with detailed readability analyses confirming alignment to the specified language complexity. The resulting dataset and analytics framework offer a scalable resource for training and evaluating emotion-aware, adaptive dialogue systems in customer support and related HCI domains.

Abstract

The study illustrates a first step towards an ongoing work aimed at developing a dataset of dialogues potentially useful for customer service conversation management between humans and AI chatbots. The approach exploits ChatGPT 3.5 to generate dialogues. One of the requirements is that the dialogue is characterized by a specific language proficiency level of the user; the other one is that the user expresses a specific emotion during the interaction. The generated dialogues were then evaluated for overall quality. The complexity of the language used by both humans and AI agents, has been evaluated by using standard complexity measurements. Furthermore, the attitudes and interaction patterns exhibited by the chatbot at each turn have been stored for further detection of common conversation patterns in specific emotional contexts. The methodology could improve human-AI dialogue effectiveness and serve as a basis for systems that can learn from user interactions.

Creating, Using and Assessing a Generative-AI-Based Human-Chatbot-Dialogue Dataset with User-Interaction Learning Capabilities

TL;DR

The paper addresses the need for emotion-aware dialogue datasets in customer-service contexts by generating synthetic conversations with ChatGPT-3.5 conditioned on target emotions and CEFR language levels (, , ). It introduces a pipeline that yields per-turn emotional labels and CEFR annotations, including both Explicit and Implicit Emotion Dialogues, and applies quality control via a Quality of Interaction (QoI) metric alongside ARTE-based readability assessments. A set of experiments demonstrates emotionally coherent turn sequences across anger and surprise at multiple CEFR levels, with detailed readability analyses confirming alignment to the specified language complexity. The resulting dataset and analytics framework offer a scalable resource for training and evaluating emotion-aware, adaptive dialogue systems in customer support and related HCI domains.

Abstract

The study illustrates a first step towards an ongoing work aimed at developing a dataset of dialogues potentially useful for customer service conversation management between humans and AI chatbots. The approach exploits ChatGPT 3.5 to generate dialogues. One of the requirements is that the dialogue is characterized by a specific language proficiency level of the user; the other one is that the user expresses a specific emotion during the interaction. The generated dialogues were then evaluated for overall quality. The complexity of the language used by both humans and AI agents, has been evaluated by using standard complexity measurements. Furthermore, the attitudes and interaction patterns exhibited by the chatbot at each turn have been stored for further detection of common conversation patterns in specific emotional contexts. The methodology could improve human-AI dialogue effectiveness and serve as a basis for systems that can learn from user interactions.
Paper Structure (11 sections, 17 figures)

This paper contains 11 sections, 17 figures.

Figures (17)

  • Figure 1: The overall schema of the proposed approach
  • Figure 2: The ARI average readability results for the A2, B2, and C2 CEFR levels both for the User and the Agent
  • Figure 3: The CAREC average readability results for the A2, B2, and C2 CEFR levels both for the User and the Agent
  • Figure 4: The CARECM average readability results for the A2, B2, and C2 CEFR levels both for the User and the Agent
  • Figure 5: The CML2 average readability results for the A2, B2, and C2 CEFR levels both for the User and the Agent
  • ...and 12 more figures