Table of Contents
Fetching ...

Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents

Mehdi Arjmand, Farnaz Nouraei, Ian Steenstra, Timothy Bickmore

TL;DR

The paper defines empathic grounding as an extension of grounding that includes listener empathy for the speaker's affect and presents a multimodal framework that uses speech and facial cues alongside a large language model to generate grounding moves for embodied agents. It introduces a testbed using the Furhat robot and a Wizard-of-Oz protocol to evaluate grounding moves in pain-interview conversations, demonstrating that empathic grounding enhances perceived empathy, emotional intelligence, trust, and rapport compared to a neutral backchannel baseline. The results highlight the value of emotion-aware, multimodal grounding in naturalistic human-robot interaction and identify avenues for expanding modalities, discourse context, and cross-cultural grounding in future work.

Abstract

We introduce the concept of "empathic grounding" in conversational agents as an extension of Clark's conceptualization of grounding in conversation in which the grounding criterion includes listener empathy for the speaker's affective state. Empathic grounding is generally required whenever the speaker's emotions are foregrounded and can make the grounding process more efficient and reliable by communicating both propositional and affective understanding. Both speaker expressions of affect and listener empathic grounding can be multimodal, including facial expressions and other nonverbal displays. Thus, models of empathic grounding for embodied agents should be multimodal to facilitate natural and efficient communication. We describe a multimodal model that takes as input user speech and facial expression to generate multimodal grounding moves for a listening agent using a large language model. We also describe a testbed to evaluate approaches to empathic grounding, in which a humanoid robot interviews a user about a past episode of pain and then has the user rate their perception of the robot's empathy. We compare our proposed model to one that only generates non-affective grounding cues in a between-subjects experiment. Findings demonstrate that empathic grounding increases user perceptions of empathy, understanding, emotional intelligence, and trust. Our work highlights the role of emotion awareness and multimodality in generating appropriate grounding moves for conversational agents.

Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents

TL;DR

The paper defines empathic grounding as an extension of grounding that includes listener empathy for the speaker's affect and presents a multimodal framework that uses speech and facial cues alongside a large language model to generate grounding moves for embodied agents. It introduces a testbed using the Furhat robot and a Wizard-of-Oz protocol to evaluate grounding moves in pain-interview conversations, demonstrating that empathic grounding enhances perceived empathy, emotional intelligence, trust, and rapport compared to a neutral backchannel baseline. The results highlight the value of emotion-aware, multimodal grounding in naturalistic human-robot interaction and identify avenues for expanding modalities, discourse context, and cross-cultural grounding in future work.

Abstract

We introduce the concept of "empathic grounding" in conversational agents as an extension of Clark's conceptualization of grounding in conversation in which the grounding criterion includes listener empathy for the speaker's affective state. Empathic grounding is generally required whenever the speaker's emotions are foregrounded and can make the grounding process more efficient and reliable by communicating both propositional and affective understanding. Both speaker expressions of affect and listener empathic grounding can be multimodal, including facial expressions and other nonverbal displays. Thus, models of empathic grounding for embodied agents should be multimodal to facilitate natural and efficient communication. We describe a multimodal model that takes as input user speech and facial expression to generate multimodal grounding moves for a listening agent using a large language model. We also describe a testbed to evaluate approaches to empathic grounding, in which a humanoid robot interviews a user about a past episode of pain and then has the user rate their perception of the robot's empathy. We compare our proposed model to one that only generates non-affective grounding cues in a between-subjects experiment. Findings demonstrate that empathic grounding increases user perceptions of empathy, understanding, emotional intelligence, and trust. Our work highlights the role of emotion awareness and multimodality in generating appropriate grounding moves for conversational agents.
Paper Structure (28 sections, 3 figures, 4 tables)

This paper contains 28 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: User interacting with the robot testbed for the empathic grounding model
  • Figure 2: Example Discourse Segment. 1. Agent question. 2. User response. 3. Agent grounding move.
  • Figure 3: Ratings of BACKCHANNEL and EMPATHIC GROUNDING Robots. See \ref{['sec:measures']} for other measure descriptions. (*) shows $p<0.05$ and (**) shows $p<0.01$