Large Language Models Produce Responses Perceived to be Empathic

Yoon Kyung Lee; Jina Suh; Hongli Zhan; Junyi Jessy Li; Desmond C. Ong

Large Language Models Produce Responses Perceived to be Empathic

Yoon Kyung Lee, Jina Suh, Hongli Zhan, Junyi Jessy Li, Desmond C. Ong

TL;DR

These models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations, to highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.

Abstract

Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.

Large Language Models Produce Responses Perceived to be Empathic

TL;DR

Abstract

Paper Structure (23 sections, 3 figures, 4 tables)

This paper contains 23 sections, 3 figures, 4 tables.

Introduction
Related Work
Human empathy vs AI-displayed empathy
LLMs, zero-shot learning, and displayed empathy
Experimental studies
Study 1 Methods
Stimuli
Human response
Language models
Prompt conditions
Participants and procedures
Study 1 Results
Study 1 Discussion
Study 2 Methods
Linguistic analyses
...and 8 more sections

Figures (3)

Figure 1: Base prompt (Study 1 and 2; all models) and "Empathy level" prompts (Study 1, GPT4-only)
Figure 2: Results of Study 1 (Left) and Study 2 (Right). Mean empathy ratings with 95% Confidence Intervals, calculated across posts.
Figure 3: Study 2: Results from LIWC Analyses. Top: Pronoun frequency, Middle: Punctuation, Bottom: Emotion words

Large Language Models Produce Responses Perceived to be Empathic

TL;DR

Abstract

Large Language Models Produce Responses Perceived to be Empathic

Authors

TL;DR

Abstract

Table of Contents

Figures (3)