Table of Contents
Fetching ...

Is ChatGPT More Empathetic than Humans?

Anuradha Welivita, Pearl Pu

TL;DR

Investigating the empathetic responding capabilities of ChatGPT, particularly its latest iteration, GPT-4, in comparison to human-generated responses to a wide range of emotional scenarios indicates that the average empathy rating of responses generated by ChatGPT exceeds those crafted by humans by approximately 10%.

Abstract

This paper investigates the empathetic responding capabilities of ChatGPT, particularly its latest iteration, GPT-4, in comparison to human-generated responses to a wide range of emotional scenarios, both positive and negative. We employ a rigorous evaluation methodology, involving a between-groups study with 600 participants, to evaluate the level of empathy in responses generated by humans and ChatGPT. ChatGPT is prompted in two distinct ways: a standard approach and one explicitly detailing empathy's cognitive, affective, and compassionate counterparts. Our findings indicate that the average empathy rating of responses generated by ChatGPT exceeds those crafted by humans by approximately 10%. Additionally, instructing ChatGPT to incorporate a clear understanding of empathy in its responses makes the responses align approximately 5 times more closely with the expectations of individuals possessing a high degree of empathy, compared to human responses. The proposed evaluation framework serves as a scalable and adaptable framework to assess the empathetic capabilities of newer and updated versions of large language models, eliminating the need to replicate the current study's results in future research.

Is ChatGPT More Empathetic than Humans?

TL;DR

Investigating the empathetic responding capabilities of ChatGPT, particularly its latest iteration, GPT-4, in comparison to human-generated responses to a wide range of emotional scenarios indicates that the average empathy rating of responses generated by ChatGPT exceeds those crafted by humans by approximately 10%.

Abstract

This paper investigates the empathetic responding capabilities of ChatGPT, particularly its latest iteration, GPT-4, in comparison to human-generated responses to a wide range of emotional scenarios, both positive and negative. We employ a rigorous evaluation methodology, involving a between-groups study with 600 participants, to evaluate the level of empathy in responses generated by humans and ChatGPT. ChatGPT is prompted in two distinct ways: a standard approach and one explicitly detailing empathy's cognitive, affective, and compassionate counterparts. Our findings indicate that the average empathy rating of responses generated by ChatGPT exceeds those crafted by humans by approximately 10%. Additionally, instructing ChatGPT to incorporate a clear understanding of empathy in its responses makes the responses align approximately 5 times more closely with the expectations of individuals possessing a high degree of empathy, compared to human responses. The proposed evaluation framework serves as a scalable and adaptable framework to assess the empathetic capabilities of newer and updated versions of large language models, eliminating the need to replicate the current study's results in future research.
Paper Structure (25 sections, 16 figures, 13 tables)

This paper contains 25 sections, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Between-subjects experiment design to evaluate the level of empathy demonstrated by ChatGPT compared to humans when responding to emotional situations.
  • Figure 2: Average empathy ratings corresponding to the human’s and GPT-4’s responses (based on the two prompts for all, positive, and negative emotions. Error bars are calculated using the standard errors for each. The F-values computed using the statistical one-way ANOVA test for all, positive, and negative emotions are also indicated. The corresponding p-values are all less than 0.001, which indicates very high statistical significance. The exact numerical values obtained from statistical analysis are included in Appendix \ref{['app:anova']}.
  • Figure 3: Distribution of the dialogue prompt-response pairs sampled from the EmpatheticDialogues dataset across the 32 positive and negative emotions.
  • Figure 4: The description of the task.
  • Figure 5: The tutorial.
  • ...and 11 more figures