From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents
Erfan Shayegani, Jina Suh, Andy Wilson, Nagu Rangan, Javier Hernandez
TL;DR
This work targets the gap between generic empathy in conversational AI and the need for task- and context-specific empathy. It analyzes real-world SENSE-7 data to uncover how user expectations of empathy vary by task and how perceived empathy correlates with satisfaction. Leveraging these insights, the authors build a synthetic multi-turn data generation pipeline, define task-specific empathy patterns, and train four context-specific empathetic expert adapters (LoRA-based) on frozen LLM backbones, guided by both generative and learning-based reward models. Across LLMS of different scales, their adapters outperform Baseline and System Prompt baselines in maintaining and aligning empathy with user expectations, particularly in long multi-turn conversations, demonstrating practical gains for user satisfaction, robustness, and privacy-preserving evaluation. The approach offers a concrete path to deploy context-aware empathetic agents in real-world settings, with implications for RLHF integration, mixture-of-experts architectures, and ethical deployment considerations.
Abstract
Empathy is a critical factor in fostering positive user experiences in conversational AI. While models can display empathy, it is often generic rather than tailored to specific tasks and contexts. In this work, we introduce a novel framework for developing and evaluating context-specific empathetic large language models (LLMs). We first analyze a real-world conversational dataset consisting of 672 multi-turn conversations across 8 tasks, revealing significant differences in terms of expected and experienced empathy before and after the conversations, respectively. To help minimize this gap, we develop a synthetic multi-turn conversational generation pipeline and steer responses toward our defined empathy patterns based on the context that more closely matches users' expectations. We then train empathetic expert adapters for context-specific empathy that specialize in varying empathy levels based on the recognized task. Our empirical results demonstrate a significant gap reduction of 72.66% between perceived and desired empathy with scores increasing by an average factor of 2.43 as measured by our metrics and reward models. Additionally, our trained empathetic expert adapters demonstrate superior effectiveness in preserving empathy patterns throughout conversation turns, outperforming system prompts, which tend to dramatically diminish in impact as conversations lengthen.
