Table of Contents
Fetching ...

MERCI: Multimodal Emotional and peRsonal Conversational Interactions Dataset

Mohammed Althubyani, Zhijin Meng, Shengyuan Xie, Cha Seung, Imran Razzak, Eduardo B. Sandoval, Baki Kocaballi, Francisco Cruz

TL;DR

The paper tackles the scarcity of open-domain, multimodal datasets for human–robot interaction by introducing MERCI, a dataset built from real, emotionally meaningful conversations with 30 participants and enriched with personal profiles. It presents PERCY, a five-module robot conversation framework on the ARI platform that fuses speech recognition, sentiment analysis, facial emotion recognition, and LLM-based dialogue management (GPT-4) to generate empathetic, personalized responses. The dataset supports training and evaluation of emotionally aware dialog systems, with comprehensive automatic and user-based evaluations indicating high naturalness, engagement, fluency, relevance, empathy, and consistency. The work demonstrates the practical impact of integrating personal data and emotional cues into open-domain HRI, offering a valuable resource for researchers and developers pursuing more natural and supportive human–robot conversations.

Abstract

The integration of conversational agents into our daily lives has become increasingly common, yet many of these agents cannot engage in deep interactions with humans. Despite this, there is a noticeable shortage of datasets that capture multimodal information from human-robot interaction dialogues. To address this gap, we have recorded a novel multimodal dataset (MERCI) that encompasses rich embodied interaction data. The process involved asking participants to complete a questionnaire and gathering their profiles on ten topics, such as hobbies and favorite music. Subsequently, we initiated conversations between the robot and the participants, leveraging GPT-4 to generate contextually appropriate responses based on the participant's profile and emotional state, as determined by facial expression recognition and sentiment analysis. Automatic and user evaluations were conducted to assess the overall quality of the collected data. The results of both evaluations indicated a high level of naturalness, engagement, fluency, consistency, and relevance in the conversation, as well as the robot's ability to provide empathetic responses. It is worth noting that the dataset is derived from genuine interactions with the robot, involving participants who provided personal information and conveyed actual emotions.

MERCI: Multimodal Emotional and peRsonal Conversational Interactions Dataset

TL;DR

The paper tackles the scarcity of open-domain, multimodal datasets for human–robot interaction by introducing MERCI, a dataset built from real, emotionally meaningful conversations with 30 participants and enriched with personal profiles. It presents PERCY, a five-module robot conversation framework on the ARI platform that fuses speech recognition, sentiment analysis, facial emotion recognition, and LLM-based dialogue management (GPT-4) to generate empathetic, personalized responses. The dataset supports training and evaluation of emotionally aware dialog systems, with comprehensive automatic and user-based evaluations indicating high naturalness, engagement, fluency, relevance, empathy, and consistency. The work demonstrates the practical impact of integrating personal data and emotional cues into open-domain HRI, offering a valuable resource for researchers and developers pursuing more natural and supportive human–robot conversations.

Abstract

The integration of conversational agents into our daily lives has become increasingly common, yet many of these agents cannot engage in deep interactions with humans. Despite this, there is a noticeable shortage of datasets that capture multimodal information from human-robot interaction dialogues. To address this gap, we have recorded a novel multimodal dataset (MERCI) that encompasses rich embodied interaction data. The process involved asking participants to complete a questionnaire and gathering their profiles on ten topics, such as hobbies and favorite music. Subsequently, we initiated conversations between the robot and the participants, leveraging GPT-4 to generate contextually appropriate responses based on the participant's profile and emotional state, as determined by facial expression recognition and sentiment analysis. Automatic and user evaluations were conducted to assess the overall quality of the collected data. The results of both evaluations indicated a high level of naturalness, engagement, fluency, consistency, and relevance in the conversation, as well as the robot's ability to provide empathetic responses. It is worth noting that the dataset is derived from genuine interactions with the robot, involving participants who provided personal information and conveyed actual emotions.

Paper Structure

This paper contains 29 sections, 5 figures.

Figures (5)

  • Figure 1: PERCY analyzes users' emotional state through real-time facial emotion recognizer and sentiment analysis. Appropriate responses will be generated based on the users' emotional state and finally, PERCY will speak to the response back to the user.
  • Figure 2: In addition to the participant and the robot, two external cameras are positioned to record the experiment from the front and the side, providing a comprehensive overview of the conversation.
  • Figure 3: The stream data from the ARI robot head camera (a), the front camera (b) and the side camera (c)
  • Figure 4: The participant pool is diverse in terms of age, cultural background, and educational background. .
  • Figure 5: Emotions distribution across all the conversations.