Table of Contents
Fetching ...

Large Language Models Produce Responses Perceived to be Empathic

Yoon Kyung Lee, Jina Suh, Hongli Zhan, Junyi Jessy Li, Desmond C. Ong

TL;DR

These models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations, to highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.

Abstract

Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.

Large Language Models Produce Responses Perceived to be Empathic

TL;DR

These models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations, to highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.

Abstract

Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.
Paper Structure (23 sections, 3 figures, 4 tables)

This paper contains 23 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Base prompt (Study 1 and 2; all models) and "Empathy level" prompts (Study 1, GPT4-only)
  • Figure 2: Results of Study 1 (Left) and Study 2 (Right). Mean empathy ratings with 95% Confidence Intervals, calculated across posts.
  • Figure 3: Study 2: Results from LIWC Analyses. Top: Pronoun frequency, Middle: Punctuation, Bottom: Emotion words