Table of Contents
Fetching ...

Comparing Moral Values in Western English-speaking societies and LLMs with Word Associations

Chaoyi Xiang, Chunhua Liu, Simon De Deyne, Lea Frermann

TL;DR

The paper addresses how to assess moral values in large language models by inferring internal representations through word associations, focusing on western English-speaking cultures. It builds two Global Moral Networks (gmn-h from human wa-h and gmn-l from LLM wa-l) by propagating Moral Foundations seeds across a global association graph using a random-walk framework, with the propagation expressed as $F_{t+1} = \alpha S F_t + (1 - \alpha) F_0$ and $S = D^{-1/2} W D^{-1/2}$. The study finds substantial alignment with human moral judgments on positive dimensions but systematic divergences on negative concepts, with humans showing more emotionality and concreteness than the LLMs. The framework provides a scalable, cross-cultural tool for evaluating AI moral representations and guiding safer, more aligned AI deployment, while acknowledging cultural constraints and suggesting pathways for broader cultural generalization and downstream behavioral interpretation.

Abstract

As the impact of large language models increases, understanding the moral values they reflect becomes ever more important. Assessing the nature of moral values as understood by these models via direct prompting is challenging due to potential leakage of human norms into model training data, and their sensitivity to prompt formulation. Instead, we propose to use word associations, which have been shown to reflect moral reasoning in humans, as low-level underlying representations to obtain a more robust picture of LLMs' moral reasoning. We study moral differences in associations from western English-speaking communities and LLMs trained predominantly on English data. First, we create a large dataset of LLM-generated word associations, resembling an existing data set of human word associations. Next, we propose a novel method to propagate moral values based on seed words derived from Moral Foundation Theory through the human and LLM-generated association graphs. Finally, we compare the resulting moral conceptualizations, highlighting detailed but systematic differences between moral values emerging from English speakers and LLM associations.

Comparing Moral Values in Western English-speaking societies and LLMs with Word Associations

TL;DR

The paper addresses how to assess moral values in large language models by inferring internal representations through word associations, focusing on western English-speaking cultures. It builds two Global Moral Networks (gmn-h from human wa-h and gmn-l from LLM wa-l) by propagating Moral Foundations seeds across a global association graph using a random-walk framework, with the propagation expressed as and . The study finds substantial alignment with human moral judgments on positive dimensions but systematic divergences on negative concepts, with humans showing more emotionality and concreteness than the LLMs. The framework provides a scalable, cross-cultural tool for evaluating AI moral representations and guiding safer, more aligned AI deployment, while acknowledging cultural constraints and suggesting pathways for broader cultural generalization and downstream behavioral interpretation.

Abstract

As the impact of large language models increases, understanding the moral values they reflect becomes ever more important. Assessing the nature of moral values as understood by these models via direct prompting is challenging due to potential leakage of human norms into model training data, and their sensitivity to prompt formulation. Instead, we propose to use word associations, which have been shown to reflect moral reasoning in humans, as low-level underlying representations to obtain a more robust picture of LLMs' moral reasoning. We study moral differences in associations from western English-speaking communities and LLMs trained predominantly on English data. First, we create a large dataset of LLM-generated word associations, resembling an existing data set of human word associations. Next, we propose a novel method to propagate moral values based on seed words derived from Moral Foundation Theory through the human and LLM-generated association graphs. Finally, we compare the resulting moral conceptualizations, highlighting detailed but systematic differences between moral values emerging from English speakers and LLM associations.

Paper Structure

This paper contains 45 sections, 2 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: An illustration of moral information propagation (colored nodes and arrows) through word associations (gray edges). Information is propagated from the moral seed word 'mother' ($*$). The right box contains directly connected concepts with 'mother', while the box on the left illustrates information flow to a more distant area in the graph. Color reflects the inferred moral intensity of a concept.
  • Figure 2: Overview of our two-phase framework: (1) Collecting word association graphs from humans (wa-h) and Llama (wa-l); (2) Propagating moral information through the word association graphs to obtain two global moral networks (wa-h$\rightarrow$gmn-h; wa-l$\rightarrow$gmn-l), where red and blue nodes indicate words with negative and positive inferred moral scores, respectively.
  • Figure 3: Effect of temperature on differences in variability (blue) and reliability (red) between wa-l and wa-h (0 is best).
  • Figure 4: Precision@K for wa-l, and Word2Vec Associations relative to wa-h. Shaded regions show standard deviation over 50 runs. Correlation scores are noted.
  • Figure 5: Precision@K for wa-h and wa-l associations.
  • ...and 2 more figures