EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
Xinyan Chen, Jiaxin Ge, Hongming Dai, Qiang Zhou, Qiuxuan Feng, Jingtong Hu, Yizhou Wang, Jiaming Liu, Shanghang Zhang
TL;DR
EmpathyAgent addresses whether embodied agents can conduct human-like empathetic actions by introducing a first-of-its-kind benchmark comprising 10k multimodal samples and a three-challenge pipeline (Scenario Understanding, Empathetic Planning, Empathetic Actions) in the VirtualHome environment. The authors establish a comprehensive evaluation framework with both reference-based and reference-free metrics to quantify empathetic understanding and behavior, and they benchmark several LLMs and multimodal models, finding current systems struggle with empathetic actions. They further demonstrate that instruction finetuning on Llama3-8B yields substantial improvements, with performance sometimes surpassing GPT-4-turbo on reference-based metrics, and RLHF providing additional gains. The benchmark’s scalability, along with public release of code and data, aims to advance the development of grounded, empathetic embodied agents and enables principled, reproducible study of empathetic AI in real-world-like settings.
Abstract
Empathy is fundamental to human interactions, yet it remains unclear whether embodied agents can provide human-like empathetic support. Existing works have studied agents' tasks solving and social interactions abilities, but whether agents can understand empathetic needs and conduct empathetic behaviors remains overlooked. To address this, we introduce EmpathyAgent, the first benchmark to evaluate and enhance agents' empathetic actions across diverse scenarios. EmpathyAgent contains 10,000 multimodal samples with corresponding empathetic task plans and three different challenges. To systematically evaluate the agents' empathetic actions, we propose an empathy-specific evaluation suite that evaluates the agents' empathy process. We benchmark current models and found that exhibiting empathetic actions remains a significant challenge. Meanwhile, we train Llama3-8B using EmpathyAgent and find it can potentially enhance empathetic behavior. By establishing a standard benchmark for evaluating empathetic actions, we hope to advance research in empathetic embodied agents. Our code and data are publicly available at https://github.com/xinyan-cxy/EmpathyAgent.
