Table of Contents
Fetching ...

Practicing with Language Models Cultivates Human Empathic Communication

Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Bruce Lambert, Matthew Groh

Abstract

Empathy is central to human connection, yet people often struggle to express it effectively. In blinded evaluations, large language models (LLMs) generate responses that are often judged more empathic than human-written ones. Yet when a response is attributed to AI, recipients feel less heard and validated than when comparable responses are attributed to a human. To probe and address this gap in empathic communication skill, we built Lend an Ear, an experimental conversation platform in which participants are asked to offer empathic support to an LLM role-playing personal and workplace troubles. From 33,938 messages spanning 2,904 text-based conversations between 968 participants and their LLM conversational partners, we derive a data-driven taxonomy of idiomatic empathic expressions in naturalistic dialogue. Based on a pre-registered randomized experiment, we present evidence that a brief LLM coaching intervention offering personalized feedback on how to effectively communicate empathy significantly boosts alignment of participants' communication patterns with normative empathic communication patterns relative to both a control group and a group that received video-based but non-personalized feedback. Moreover, we find evidence for a silent empathy effect that people feel empathy but systematically fail to express it. Nonetheless, participants reliably identify responses aligned with normative empathic communication criteria as more expressive of empathy. Together, these results advance the scientific understanding of how empathy is expressed and valued and demonstrate a scalable, AI-based intervention for scaffolding and cultivating it.

Practicing with Language Models Cultivates Human Empathic Communication

Abstract

Empathy is central to human connection, yet people often struggle to express it effectively. In blinded evaluations, large language models (LLMs) generate responses that are often judged more empathic than human-written ones. Yet when a response is attributed to AI, recipients feel less heard and validated than when comparable responses are attributed to a human. To probe and address this gap in empathic communication skill, we built Lend an Ear, an experimental conversation platform in which participants are asked to offer empathic support to an LLM role-playing personal and workplace troubles. From 33,938 messages spanning 2,904 text-based conversations between 968 participants and their LLM conversational partners, we derive a data-driven taxonomy of idiomatic empathic expressions in naturalistic dialogue. Based on a pre-registered randomized experiment, we present evidence that a brief LLM coaching intervention offering personalized feedback on how to effectively communicate empathy significantly boosts alignment of participants' communication patterns with normative empathic communication patterns relative to both a control group and a group that received video-based but non-personalized feedback. Moreover, we find evidence for a silent empathy effect that people feel empathy but systematically fail to express it. Nonetheless, participants reliably identify responses aligned with normative empathic communication criteria as more expressive of empathy. Together, these results advance the scientific understanding of how empathy is expressed and valued and demonstrate a scalable, AI-based intervention for scaffolding and cultivating it.
Paper Structure (19 sections, 12 figures, 15 tables)

This paper contains 19 sections, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Overview of the Lend an Ear experiment. A. Examples of empathic responses by participants to a conversational partner's disclosure of job loss on the Lend an Ear platform that tend to make people feel heard (left panel) and that tend to not be effective at making people feel heard (right panel). B. User interface of the chat window with an LLM conversational partner (left) and personalized feedback from the AI coach (right). C. Experimental design flowchart illustrating participant recruitment, random assignment to four conditions (Control, Video, AI Coach, Combined), and the sequence of surveys, conversations, and feedback.
  • Figure 2: Communication patterns in conversations. A. Hierarchical taxonomy of empathic communication in naturalistic dialogues in Lend an Ear. This four-level structure integrates bottom-up discovery of 128 themes via k-sparse autoencoders with qualitatively coded top-down theoretical categories (Affective, Cognitive, Motivational, and Misattuned). B. Change in incidence of empathic communication strategies after feedback from AI Coach in Lend an Ear, shown separately for personal (left) and workplace (right) conversation contexts. Dots represent estimated change in percentage points shown with 95% confidence intervals. Categories are color-coded by type: Affective (green), Cognitive (cyan), Motivational (blue), and Misattuned (red). C. Frequency changes (post minus pre) across all 128 themes identified by the k-sparse autoencoder for the AI coach and Combined conditions, ranked by magnitude of change for personal and workplace scenarios. Categories are color-coded as Affective (green), Cognitive (cyan), Motivational (blue), and Misattuned (red).
  • Figure 3: Change in empathic communication behaviors. A. Each vertical line represents one participant, connecting their baseline score (conversation 1) to their post-intervention score (mean of conversations 2-3), represented by a dot. Dotted lines indicate change; darker lines indicate reliable change. The y-axis is the overall empathy score calculated as the sum of prescriptive behavior ratings minus proscriptive behavior ratings. The x-axis ranks participants by percentile, sorted by magnitude of change within each condition. Black circles mark post-intervention scores. The horizontal bar indicates the RCI threshold for reliable change. B. Standardized intervention effects (in SD units) on six preregistered dimensions of empathic communication. Bars show OLS regression coefficients comparing each condition to baseline, with 95% confidence intervals. Asterisks indicate statistical significance (* $p < 0.05$, ** $p < 0.01$, *** $p < 0.001$). All analyses follow the preregistered analysis plan.
  • Figure 4: Example conversational trajectory of a participant from the AI coach condition. A. Conversation 1 (pre-training). B. Conversation 3 (post-training). C. Trajectory of empathic communication dimension ratings across three conversations for a top-improving participant
  • Figure 5: Disconnect between expressed empathy and felt empathy A) Relationship between trait empathy scores (Jordan Empathy Scale jordan2016empathy and SITES konrath2018development) and LLM-evaluated empathic communication performance across six behavioral dimensions. Each gray point represents an individual participant, with red dots indicating mean trait empathy scores and error bars showing 95% confidence intervals. B) Relationship between LLM-evaluated performance and participant self-reported empathic communication performance across three sub-components. Each point represents an individual participant's response and the corresponding LLM evaluation.
  • ...and 7 more figures