Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Gabriel Simmons
TL;DR
This study investigates whether large language models exhibit moral mimicry by generating moral rationalizations aligned with liberal or conservative political identities, using Moral Foundations Theory and three dictionary-based measures. By prompting models with scenario-content, identity cues, and moral stances, the authors quantify foundation-use and demonstrate that LLMs tend to mirror the moral biases associated with the prompted identity, with effect sizes generally aligning with the Moral Foundations Hypothesis. The work also shows that mimicry strength grows with model size in several families (notably OPT and GPT-3.5) and that LLM outputs diverge from human consensus more than humans diverge from each other, underscoring both the potential influence of LLMs on public discourse and the need for careful evaluation and mitigation. Overall, the results illuminate how political identity conditioning interacts with foundational morality in LLMs and highlight the importance of model design and data curation in shaping morally consequential language generation.
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities in generating fluent text, as well as tendencies to reproduce undesirable social biases. This study investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability herein termed moral mimicry. This hypothesis is explored in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics. When prompted with a liberal or conservative political identity, the models generate text reflecting corresponding moral biases. This study also explores the relationship between moral mimicry and model size, and similarity between human and LLM moral word use.
