Table of Contents
Fetching ...

Whose Emotions and Moral Sentiments Do Language Models Reflect?

Zihao He, Siyi Guo, Ashwin Rao, Kristina Lerman

TL;DR

The paper investigates affective alignment in language models by comparing LM-generated emotional and moral content to real-world sociopolitical discourse on Twitter. It introduces a formal framework that quantifies alignment via distributions over emotions and moral foundations using Jensen-Shannon distance, augmented by Plutchik-based emotion proximity to capture related affective signals. Across two datasets (COVID-19 and Roe v. Wade) and 36 LMs, both default and steered prompting reveal substantial misalignment with liberals and conservatives, with liberal tendencies on COVID-19 that persist despite steering. Steering improves alignment for many instruction-tuned models but fails to reach the partisan baseline and does not fully mitigate biases, suggesting systemic affective biases embedded in current LMs. The work provides a foundational framework for evaluating and addressing affective representativeness in AI systems and highlights important implications for social impact, moderation, and fairness in AI deployment.

Abstract

Language models (LMs) are known to represent the perspectives of some social groups better than others, which may impact their performance, especially on subjective tasks such as content moderation and hate speech detection. To explore how LMs represent different perspectives, existing research focused on positional alignment, i.e., how closely the models mimic the opinions and stances of different groups, e.g., liberals or conservatives. However, human communication also encompasses emotional and moral dimensions. We define the problem of affective alignment, which measures how LMs' emotional and moral tone represents those of different groups. By comparing the affect of responses generated by 36 LMs to the affect of Twitter messages, we observe significant misalignment of LMs with both ideological groups. This misalignment is larger than the partisan divide in the U.S. Even after steering the LMs towards specific ideological perspectives, the misalignment and liberal tendencies of the model persist, suggesting a systemic bias within LMs.

Whose Emotions and Moral Sentiments Do Language Models Reflect?

TL;DR

The paper investigates affective alignment in language models by comparing LM-generated emotional and moral content to real-world sociopolitical discourse on Twitter. It introduces a formal framework that quantifies alignment via distributions over emotions and moral foundations using Jensen-Shannon distance, augmented by Plutchik-based emotion proximity to capture related affective signals. Across two datasets (COVID-19 and Roe v. Wade) and 36 LMs, both default and steered prompting reveal substantial misalignment with liberals and conservatives, with liberal tendencies on COVID-19 that persist despite steering. Steering improves alignment for many instruction-tuned models but fails to reach the partisan baseline and does not fully mitigate biases, suggesting systemic affective biases embedded in current LMs. The work provides a foundational framework for evaluating and addressing affective representativeness in AI systems and highlights important implications for social impact, moderation, and fairness in AI deployment.

Abstract

Language models (LMs) are known to represent the perspectives of some social groups better than others, which may impact their performance, especially on subjective tasks such as content moderation and hate speech detection. To explore how LMs represent different perspectives, existing research focused on positional alignment, i.e., how closely the models mimic the opinions and stances of different groups, e.g., liberals or conservatives. However, human communication also encompasses emotional and moral dimensions. We define the problem of affective alignment, which measures how LMs' emotional and moral tone represents those of different groups. By comparing the affect of responses generated by 36 LMs to the affect of Twitter messages, we observe significant misalignment of LMs with both ideological groups. This misalignment is larger than the partisan divide in the U.S. Even after steering the LMs towards specific ideological perspectives, the misalignment and liberal tendencies of the model persist, suggesting a systemic bias within LMs.
Paper Structure (29 sections, 2 equations, 8 figures, 4 tables)

This paper contains 29 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: The framework for evaluating affective alignment of LMs. We first prompt LMs to generate tweets on a topic using default prompting or steered prompting. The distributions of emotions and moral sentiments of LM-generated tweets are then compared to that of human-authored tweets. Affective alignment is measured as one minus the Jensen-Shannon distance (JSD) between the two distributions.
  • Figure 2: Default affect alignment $S$ of different LMs with ideological groups -- liberals ($g_l$) and conservatives ($g_c$), measured by emotions. * indicates that the alignment of the model with both ideological groups are significantly different at $p<0.05$. For each LM, the alignment is averaged over that on different topics related to the issue, with the means shown by circles and the standard deviations shown by errors bars. Base LMs and instruction-tuned LMs are separated by the black horizontal dashed line. The alignment between the two ideological groups (above the red horizontal dashed line) themselves are measured as a baseline.
  • Figure 3: Steered affective alignment $S$ of different instruction-tuned LMs with both ideological groups -- liberals ($g_l$) and conservatives ($g_c$), measured by emotions, for . * indicates that the alignment of the liberal steered model with both ideological groups are significantly different at $p<0.05$; ^ indicates that for the conservative steered model. Left-facing triangles represent the models by liberal steered prompting; right-facing triangles represent the models by conservative steered prompting; circles with no filling colors represent the models by default. For each LM, the alignment is averaged over that on different topics detected within the dataset. The alignment between the two ideological groups (above the red horizontal dashed line) themselves are measured as a baseline.
  • Figure 4: Distribution of affect (emotions and moral foundations) on topic "COVID-19 mask mandates and policies" in COVID-19 Tweets, from human-authored tweets and those generated by different LMs using different ways of prompting.
  • Figure 5: Default affect alignment $S$ of different LMs with both ideological groups -- liberals ($g_l$) and conservatives ($g_c$), measured by moral foundations. * indicates that the alignment of the liberal steered model with both ideological groups are significantly different at $p<0.05$. For each LM, the alignment is averaged over that on different topics detected within the dataset, with the means shown by circles and the standard deviations shown by errors bars. Base LMs and instruction-tuned LMs are separated by the black horizontal dashed line. The alignment between the two ideological groups (above the red horizontal dashed line) themselves are measured as a baseline.
  • ...and 3 more figures