Table of Contents
Fetching ...

EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?

Xinyan Chen, Jiaxin Ge, Hongming Dai, Qiang Zhou, Qiuxuan Feng, Jingtong Hu, Yizhou Wang, Jiaming Liu, Shanghang Zhang

TL;DR

EmpathyAgent addresses whether embodied agents can conduct human-like empathetic actions by introducing a first-of-its-kind benchmark comprising 10k multimodal samples and a three-challenge pipeline (Scenario Understanding, Empathetic Planning, Empathetic Actions) in the VirtualHome environment. The authors establish a comprehensive evaluation framework with both reference-based and reference-free metrics to quantify empathetic understanding and behavior, and they benchmark several LLMs and multimodal models, finding current systems struggle with empathetic actions. They further demonstrate that instruction finetuning on Llama3-8B yields substantial improvements, with performance sometimes surpassing GPT-4-turbo on reference-based metrics, and RLHF providing additional gains. The benchmark’s scalability, along with public release of code and data, aims to advance the development of grounded, empathetic embodied agents and enables principled, reproducible study of empathetic AI in real-world-like settings.

Abstract

Empathy is fundamental to human interactions, yet it remains unclear whether embodied agents can provide human-like empathetic support. Existing works have studied agents' tasks solving and social interactions abilities, but whether agents can understand empathetic needs and conduct empathetic behaviors remains overlooked. To address this, we introduce EmpathyAgent, the first benchmark to evaluate and enhance agents' empathetic actions across diverse scenarios. EmpathyAgent contains 10,000 multimodal samples with corresponding empathetic task plans and three different challenges. To systematically evaluate the agents' empathetic actions, we propose an empathy-specific evaluation suite that evaluates the agents' empathy process. We benchmark current models and found that exhibiting empathetic actions remains a significant challenge. Meanwhile, we train Llama3-8B using EmpathyAgent and find it can potentially enhance empathetic behavior. By establishing a standard benchmark for evaluating empathetic actions, we hope to advance research in empathetic embodied agents. Our code and data are publicly available at https://github.com/xinyan-cxy/EmpathyAgent.

EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?

TL;DR

EmpathyAgent addresses whether embodied agents can conduct human-like empathetic actions by introducing a first-of-its-kind benchmark comprising 10k multimodal samples and a three-challenge pipeline (Scenario Understanding, Empathetic Planning, Empathetic Actions) in the VirtualHome environment. The authors establish a comprehensive evaluation framework with both reference-based and reference-free metrics to quantify empathetic understanding and behavior, and they benchmark several LLMs and multimodal models, finding current systems struggle with empathetic actions. They further demonstrate that instruction finetuning on Llama3-8B yields substantial improvements, with performance sometimes surpassing GPT-4-turbo on reference-based metrics, and RLHF providing additional gains. The benchmark’s scalability, along with public release of code and data, aims to advance the development of grounded, empathetic embodied agents and enables principled, reproducible study of empathetic AI in real-world-like settings.

Abstract

Empathy is fundamental to human interactions, yet it remains unclear whether embodied agents can provide human-like empathetic support. Existing works have studied agents' tasks solving and social interactions abilities, but whether agents can understand empathetic needs and conduct empathetic behaviors remains overlooked. To address this, we introduce EmpathyAgent, the first benchmark to evaluate and enhance agents' empathetic actions across diverse scenarios. EmpathyAgent contains 10,000 multimodal samples with corresponding empathetic task plans and three different challenges. To systematically evaluate the agents' empathetic actions, we propose an empathy-specific evaluation suite that evaluates the agents' empathy process. We benchmark current models and found that exhibiting empathetic actions remains a significant challenge. Meanwhile, we train Llama3-8B using EmpathyAgent and find it can potentially enhance empathetic behavior. By establishing a standard benchmark for evaluating empathetic actions, we hope to advance research in empathetic embodied agents. Our code and data are publicly available at https://github.com/xinyan-cxy/EmpathyAgent.

Paper Structure

This paper contains 59 sections, 2 equations, 30 figures, 8 tables.

Figures (30)

  • Figure 1: We propose EmpathyAgent, the first benchmark to evaluate and enhance the empathetic actions of embodied agents. In a simulated environment, the embodied agent is tasked to observe a scenario and then perform responsive empathetic actions. In this example, the agent first observes that there is a person sitting on the sofa and sighing. Considering the background information, the agent realizes the person is sad, and then conducts the action of bringing some water to the person. Meanwhile, EmpathyAgent can also be used to train embodied agents and boost empathetic behaviors.
  • Figure 2: An example of EmpathyAgent. The agent is provided with an input scenario, which contains a character with a personal background; A video of the character taking a sequence of actions (e.g., rushing to get the phone and then get the apples); A language cue from the character (e.g., saying something while performing the actions). To perform empathetic actions after observing this scenario, the agent needs to conduct three steps: (1) Scenario Understanding: Based on the scenario, the agent goes through a cognitive and affective process to determine the emotional state of the person and the possible causes that lead to this state. (2) Empathetic Planning: Based on the understanding from the internal empathy process, the agent comes up with possible plans of what actions to conduct under this scenario. The agent should reason about which plan meets the empathetic needs of the person based on the person's personal background. (3) Empathetic Actions: Finally, based on the high-level plan, the agent outputs a series of grounded and executable empathetic actions and performs them in the environment.
  • Figure 3: Benchmark creation pipeline.Step1, we generate diverse scenarios. To do this, we sample a character and the character's input action. We use them to retrieve data from EmpatheticDialogues and use them together to generate a scenario description and the person's dialogue. The retrieval step ensures the generated scenario's diversity. Step2, we generate an empathetic response for each scenario. To do this, we use the scenario to retrieve the top two data points from the EmpatheticDialogues and use each of them as a source to generate a corresponding empathetic response. We then let the model choose the more empathetic response by using human-annotated examples and explanations as in-context examples. In this way, we construct a paired empathetic response where one is labeled more empathetic and the other is labeled less.
  • Figure 4: Qualitative results of GPT-4o. We test the scenario understanding and empathetic planning capability of GPT-4o. We find that GPT-4o has strong capabilities in empathetic scenario understanding and high-level empathetic planning.
  • Figure 5: Comparison between GPT-4 and instruction tuned Llama3-8B. We sample 10 pairs of data and report the GPT win rate and human win rate. Specifically, we ask either GPT/human annotator to choose which response is more empathetic. We find that instruction-finetuned Llama3-8B outperforms GPT-4-turbo with significantly fewer parameters, suggesting that the benchmark can be potentially leveraged to build a powerful empathetic agent.
  • ...and 25 more figures