MERCI: Multimodal Emotional and peRsonal Conversational Interactions Dataset
Mohammed Althubyani, Zhijin Meng, Shengyuan Xie, Cha Seung, Imran Razzak, Eduardo B. Sandoval, Baki Kocaballi, Francisco Cruz
TL;DR
The paper tackles the scarcity of open-domain, multimodal datasets for human–robot interaction by introducing MERCI, a dataset built from real, emotionally meaningful conversations with 30 participants and enriched with personal profiles. It presents PERCY, a five-module robot conversation framework on the ARI platform that fuses speech recognition, sentiment analysis, facial emotion recognition, and LLM-based dialogue management (GPT-4) to generate empathetic, personalized responses. The dataset supports training and evaluation of emotionally aware dialog systems, with comprehensive automatic and user-based evaluations indicating high naturalness, engagement, fluency, relevance, empathy, and consistency. The work demonstrates the practical impact of integrating personal data and emotional cues into open-domain HRI, offering a valuable resource for researchers and developers pursuing more natural and supportive human–robot conversations.
Abstract
The integration of conversational agents into our daily lives has become increasingly common, yet many of these agents cannot engage in deep interactions with humans. Despite this, there is a noticeable shortage of datasets that capture multimodal information from human-robot interaction dialogues. To address this gap, we have recorded a novel multimodal dataset (MERCI) that encompasses rich embodied interaction data. The process involved asking participants to complete a questionnaire and gathering their profiles on ten topics, such as hobbies and favorite music. Subsequently, we initiated conversations between the robot and the participants, leveraging GPT-4 to generate contextually appropriate responses based on the participant's profile and emotional state, as determined by facial expression recognition and sentiment analysis. Automatic and user evaluations were conducted to assess the overall quality of the collected data. The results of both evaluations indicated a high level of naturalness, engagement, fluency, consistency, and relevance in the conversation, as well as the robot's ability to provide empathetic responses. It is worth noting that the dataset is derived from genuine interactions with the robot, involving participants who provided personal information and conveyed actual emotions.
