PERCY: Personal Emotional Robotic Conversational System

Zhijin Meng; Mohammed Althubyani; Shengyuan Xie; Imran Razzak; Eduardo B. Sandoval; Mahdi Bamdad; Francisco Cruz

PERCY: Personal Emotional Robotic Conversational System

Zhijin Meng, Mohammed Althubyani, Shengyuan Xie, Imran Razzak, Eduardo B. Sandoval, Mahdi Bamdad, Francisco Cruz

TL;DR

The paper addresses the gap of emotional awareness and long-term personalization in social robotics by integrating open-domain GPT-4 reasoning with real-time multimodal affect grounding. It introduces PERCY, a ROS-based system that fuses visual and textual cues to condition responses via a multimodal GPT-4 reasoning engine, enabling synchronized verbal and non-verbal behavior. Through automated and human evaluations, PERCY demonstrates strong empathy, personalization, and competitive naturalness, achieving an emotion-recognition accuracy of $92.0\%$ and an end-to-end latency of $1.7\,\text{s}$, while outperforming text-only GPT-4 and EmpGPT-3 on personalization and diversity. The work offers practical insights and a foundation for scalable, ethically grounded emotionally intelligent human–robot interaction in open-domain settings, with open-source intent to catalyze future research.

Abstract

Traditional rule-based conversational robots, constrained by predefined scripts and static response mappings, fundamentally lack adaptability for personalized, long-term human interaction. While Large Language Models (LLMs) like GPT-4 have revolutionized conversational AI through open-domain capabilities, current social robots implementing LLMs still lack emotional awareness and continuous personalization. This dual limitation hinders their ability to sustain engagement across multiple interaction sessions. We bridge this gap with PERCY (Personal Emotional Robotic Conversational sYstem), a system designed to enable open-domain, multi-turn dialogues by dynamically analyzing users' real-time facial expressions and vocabulary to tailor responses based on their emotional state. Built on a ROS-based multimodal framework, PERCY integrates a fine-tuned GPT-4 reasoning engine, combining textual sentiment analysis with visual emotional cues to accurately assess and respond to user emotions. We evaluated PERCY's performance through various dialogue quality metrics, showing strong coherence, relevance, and diversity. Human evaluations revealed PERCY's superior personalization and comparable naturalness to other models. This work highlights the potential for integrating advanced multimodal perception and personalization in social robot dialogue systems.

PERCY: Personal Emotional Robotic Conversational System

TL;DR

Abstract

PERCY: Personal Emotional Robotic Conversational System

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)