Table of Contents
Fetching ...

Differences in the Moral Foundations of Large Language Models

Peter Kirgis

TL;DR

This paper tackles the opacity of normative judgments in large language models by applying Moral Foundations Theory (MFT) to a synthetic cross-provider evaluation using Moral Foundations Vignettes (MFV). It analyzes responses from a broad set of providers and compares them to a nationally representative human baseline, employing descriptive statistics, rank correlations, PCA, and FrameAxis linguistic analysis. Key findings show that LLMs weight moral foundations differently from humans, with a liberal tilt toward care, fairness, and liberty and a drift away from binding foundations as models become larger and more capable, though provider-level differences persist. The work argues for more MFT-informed alignment research and policy discussion to address the normative implications of AI systems in politics, business, and education.

Abstract

Large language models are increasingly being used in critical domains of politics, business, and education, but the nature of their normative ethical judgment remains opaque. Alignment research has, to date, not sufficiently utilized perspectives and insights from the field of moral psychology to inform training and evaluation of frontier models. I perform a synthetic experiment on a wide range of models from most major model providers using Jonathan Haidt's influential moral foundations theory (MFT) to elicit diverse value judgments from LLMs. Using multiple descriptive statistical approaches, I document the bias and variance of large language model responses relative to a human baseline in the original survey. My results suggest that models rely on different moral foundations from one another and from a nationally representative human baseline, and these differences increase as model capabilities increase. This work seeks to spur further analysis of LLMs using MFT, including finetuning of open-source models, and greater deliberation by policymakers on the importance of moral foundations for LLM alignment.

Differences in the Moral Foundations of Large Language Models

TL;DR

This paper tackles the opacity of normative judgments in large language models by applying Moral Foundations Theory (MFT) to a synthetic cross-provider evaluation using Moral Foundations Vignettes (MFV). It analyzes responses from a broad set of providers and compares them to a nationally representative human baseline, employing descriptive statistics, rank correlations, PCA, and FrameAxis linguistic analysis. Key findings show that LLMs weight moral foundations differently from humans, with a liberal tilt toward care, fairness, and liberty and a drift away from binding foundations as models become larger and more capable, though provider-level differences persist. The work argues for more MFT-informed alignment research and policy discussion to address the normative implications of AI systems in politics, business, and education.

Abstract

Large language models are increasingly being used in critical domains of politics, business, and education, but the nature of their normative ethical judgment remains opaque. Alignment research has, to date, not sufficiently utilized perspectives and insights from the field of moral psychology to inform training and evaluation of frontier models. I perform a synthetic experiment on a wide range of models from most major model providers using Jonathan Haidt's influential moral foundations theory (MFT) to elicit diverse value judgments from LLMs. Using multiple descriptive statistical approaches, I document the bias and variance of large language model responses relative to a human baseline in the original survey. My results suggest that models rely on different moral foundations from one another and from a nationally representative human baseline, and these differences increase as model capabilities increase. This work seeks to spur further analysis of LLMs using MFT, including finetuning of open-source models, and greater deliberation by policymakers on the importance of moral foundations for LLM alignment.

Paper Structure

This paper contains 18 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Relevance of moral foundations across political identity, from graham2009liberals
  • Figure 2: Mean Differences between Model Provider and Average Human Scores
  • Figure 3: Rank Correlation Matrix for Mean Foundation Values by Model Provider
  • Figure 4: Biplot of PCA Scores and Average Foundation Loadings
  • Figure 5: Mean Intensity of Foundation Language in Justifications