Table of Contents
Fetching ...

Are LLMs Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Model's Empathy

Ananya Malik, Nazanin Sabri, Melissa Karnaze, Mai Elsherief

TL;DR

This study probes whether LLMs exhibit equitable empathy across demographic personas defined by age, culture, and gender, using 315 intersectional configurations across four LLM families and the ISEAR dataset. It separately quantifies affective empathy via Earth Mover's Distance on NRC emotion intensities and cognitive empathy via the EPITOME framework, analyzing both isolated and intersectional attribute conditions. The results reveal substantial cross-attribute variation and frequent misalignments with real-world emotion patterns, especially for Confucian cultures and younger ages, with intersectionality often dampening expected empathy signals. The work argues for empathy-aware alignment frameworks that ensure inclusive, culturally aware, and bias-mitigated empathic behavior in AI systems.

Abstract

Large Language Models' (LLMs) ability to converse naturally is empowered by their ability to empathetically understand and respond to their users. However, emotional experiences are shaped by demographic and cultural contexts. This raises an important question: Can LLMs demonstrate equitable empathy across diverse user groups? We propose a framework to investigate how LLMs' cognitive and affective empathy vary across user personas defined by intersecting demographic attributes. Our study introduces a novel intersectional analysis spanning 315 unique personas, constructed from combinations of age, culture, and gender, across four LLMs. Results show that attributes profoundly shape a model's empathetic responses. Interestingly, we see that adding multiple attributes at once can attenuate and reverse expected empathy patterns. We show that they broadly reflect real-world empathetic trends, with notable misalignments for certain groups, such as those from Confucian culture. We complement our quantitative findings with qualitative insights to uncover model behaviour patterns across different demographic groups. Our findings highlight the importance of designing empathy-aware LLMs that account for demographic diversity to promote more inclusive and equitable model behaviour.

Are LLMs Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Model's Empathy

TL;DR

This study probes whether LLMs exhibit equitable empathy across demographic personas defined by age, culture, and gender, using 315 intersectional configurations across four LLM families and the ISEAR dataset. It separately quantifies affective empathy via Earth Mover's Distance on NRC emotion intensities and cognitive empathy via the EPITOME framework, analyzing both isolated and intersectional attribute conditions. The results reveal substantial cross-attribute variation and frequent misalignments with real-world emotion patterns, especially for Confucian cultures and younger ages, with intersectionality often dampening expected empathy signals. The work argues for empathy-aware alignment frameworks that ensure inclusive, culturally aware, and bias-mitigated empathic behavior in AI systems.

Abstract

Large Language Models' (LLMs) ability to converse naturally is empowered by their ability to empathetically understand and respond to their users. However, emotional experiences are shaped by demographic and cultural contexts. This raises an important question: Can LLMs demonstrate equitable empathy across diverse user groups? We propose a framework to investigate how LLMs' cognitive and affective empathy vary across user personas defined by intersecting demographic attributes. Our study introduces a novel intersectional analysis spanning 315 unique personas, constructed from combinations of age, culture, and gender, across four LLMs. Results show that attributes profoundly shape a model's empathetic responses. Interestingly, we see that adding multiple attributes at once can attenuate and reverse expected empathy patterns. We show that they broadly reflect real-world empathetic trends, with notable misalignments for certain groups, such as those from Confucian culture. We complement our quantitative findings with qualitative insights to uncover model behaviour patterns across different demographic groups. Our findings highlight the importance of designing empathy-aware LLMs that account for demographic diversity to promote more inclusive and equitable model behaviour.

Paper Structure

This paper contains 49 sections, 5 equations, 5 figures, 17 tables.

Figures (5)

  • Figure 1: We evaluate the model's ability to express empathy on the same emotional experience but for users from different demographics of age, gender, and culture. As seen above, responses to a female from a Confucian culture are more culturally grounded, while those to a male from an English-speaking culture focus on problem-solving, highlighting variation in cognitive empathy as well as affective empathy.
  • Figure 2: Distribution of Affect (top row) and Cognitive (bottom row) score shifts across models when attributes are injected independently. Left to right: LLaMA-3-70B, GPT-4o Mini, DeepSeek-v3, Gemini-2.0 Flash.
  • Figure 3: Distribution of Affect (top row) and Cognitive (bottom row) score shifts across models when Attributes are Injected in Intersectionality. Left to right: LLaMA-3-70B, GPT-4o Mini, DeepSeek-v3, Gemini-2.0 Flash.
  • Figure 4: Least Aligned Attributes across every model and emotion. The size of the attribute indicates the degree of misalignment from the model's internal state. 0-17 Age attributes and Gender Queer and Confucian Culture are frequently among the least aligned across various attributes.
  • Figure 5: Distribution of Intensity Scores in the NRC Emotion Lexicon for reach basic emotion