Table of Contents
Fetching ...

Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity

Gabriel Simmons

TL;DR

This study investigates whether large language models exhibit moral mimicry by generating moral rationalizations aligned with liberal or conservative political identities, using Moral Foundations Theory and three dictionary-based measures. By prompting models with scenario-content, identity cues, and moral stances, the authors quantify foundation-use and demonstrate that LLMs tend to mirror the moral biases associated with the prompted identity, with effect sizes generally aligning with the Moral Foundations Hypothesis. The work also shows that mimicry strength grows with model size in several families (notably OPT and GPT-3.5) and that LLM outputs diverge from human consensus more than humans diverge from each other, underscoring both the potential influence of LLMs on public discourse and the need for careful evaluation and mitigation. Overall, the results illuminate how political identity conditioning interacts with foundational morality in LLMs and highlight the importance of model design and data curation in shaping morally consequential language generation.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in generating fluent text, as well as tendencies to reproduce undesirable social biases. This study investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability herein termed moral mimicry. This hypothesis is explored in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics. When prompted with a liberal or conservative political identity, the models generate text reflecting corresponding moral biases. This study also explores the relationship between moral mimicry and model size, and similarity between human and LLM moral word use.

Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity

TL;DR

This study investigates whether large language models exhibit moral mimicry by generating moral rationalizations aligned with liberal or conservative political identities, using Moral Foundations Theory and three dictionary-based measures. By prompting models with scenario-content, identity cues, and moral stances, the authors quantify foundation-use and demonstrate that LLMs tend to mirror the moral biases associated with the prompted identity, with effect sizes generally aligning with the Moral Foundations Hypothesis. The work also shows that mimicry strength grows with model size in several families (notably OPT and GPT-3.5) and that LLM outputs diverge from human consensus more than humans diverge from each other, underscoring both the potential influence of LLMs on public discourse and the need for careful evaluation and mitigation. Overall, the results illuminate how political identity conditioning interacts with foundational morality in LLMs and highlight the importance of model design and data curation in shaping morally consequential language generation.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in generating fluent text, as well as tendencies to reproduce undesirable social biases. This study investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability herein termed moral mimicry. This hypothesis is explored in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics. When prompted with a liberal or conservative political identity, the models generate text reflecting corresponding moral biases. This study also explores the relationship between moral mimicry and model size, and similarity between human and LLM moral word use.
Paper Structure (34 sections, 13 equations, 8 figures, 3 tables)

This paper contains 34 sections, 13 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An example of the experimental methods. Prompts are constructed from scenarios , identity phrases , and stances , combined in a template (Section \ref{['sec:prompt_construction']}). Text completions are generated by LLMs based on the prompts (Section \ref{['sec:text_generation']}). The completions are analyzed for their foundational contents using the moral foundations dictionaries (Section \ref{['sec:text_evaluation']}). Differences between texts generated from liberal and conservative prompting are used to calculate effect sizes .
  • Figure 2: Left: Foundation expression probabilities for foundation-specific examples vs. average foundation use across all examples. Text-davinci-002; Social Chemistry Actions scenarios. Right: LM and individual human differences from human consensus foundation use, in response to scenarios from the Social Chemistry Situations dataset; text-davinci-002.
  • Figure 3: Effect sizes for liberal vs. conservative political identity for OPT-30B, text-davinci-001, text-davinci-002, and text-davinci-003. Dot markers represent average effect size. Error bars represent 95% CI. Shaded regions represent directions of expected effect size based on the Moral Foundations Hypothesis.
  • Figure 4: Top: Effect size vs. model parameters, based on completions obtained from Moral Stories dataset. Dark lines show mean effect size. Error bars show 95% CI. Effect sizes are averaged over the three moral foundations dictionaries.; 002: text-davinci-002; 003: text-davinci-003.; Bottom: MFH-Score vs. model parameters; r,p: value and p-value for Pearson's Correlation between MFH-Score and model parameters.; †results of correlation analysis with GPT-3 and GPT-3.5 models analyzed together
  • Figure 5: Venn diagram of word overlap between MFDv1, MFDv2 and eMFD. Since some entries in MFDv1 are regexes, I represent MFDv1 in this diagram by all non-compound words in WordNet matching a regex in MFDv1.
  • ...and 3 more figures