Table of Contents
Fetching ...

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Alice Oh, Meeyoung Cha

TL;DR

This work investigates how persona-conditioned prompts influence large language model alignment with human moral judgments in moral machine scenarios. By constructing sociodemographic personas across seven categories and applying a standardized persona prompt, the authors compare LLM responses to human baselines using AMCE-derived profiles and define Moral Decision Distance (MDD) as the Euclidean separation between persona AMCE vectors. Across three prominent LLMs (GPT-4o, GPT-3.5, Llama2), results show substantial persona-driven shifts, with political personas producing the largest alignment deviations from human judgments, especially for GPT-4o. The study highlights ethical risks of deploying morally sensitive AI and calls for more robust, multidimensional persona settings and broader scenario exploration to ensure alignment with diverse human values.

Abstract

Deploying large language models (LLMs) with agency in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans when faced with moral dilemmas? This study examines the alignment between LLM-driven decisions and human judgment in various contexts of the moral machine experiment, including personas reflecting different sociodemographics. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. Our data also indicate an interesting partisan sorting phenomenon, where political persona predominates the direction and degree of LLM decisions. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment

TL;DR

This work investigates how persona-conditioned prompts influence large language model alignment with human moral judgments in moral machine scenarios. By constructing sociodemographic personas across seven categories and applying a standardized persona prompt, the authors compare LLM responses to human baselines using AMCE-derived profiles and define Moral Decision Distance (MDD) as the Euclidean separation between persona AMCE vectors. Across three prominent LLMs (GPT-4o, GPT-3.5, Llama2), results show substantial persona-driven shifts, with political personas producing the largest alignment deviations from human judgments, especially for GPT-4o. The study highlights ethical risks of deploying morally sensitive AI and calls for more robust, multidimensional persona settings and broader scenario exploration to ensure alignment with diverse human values.

Abstract

Deploying large language models (LLMs) with agency in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans when faced with moral dilemmas? This study examines the alignment between LLM-driven decisions and human judgment in various contexts of the moral machine experiment, including personas reflecting different sociodemographics. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. Our data also indicate an interesting partisan sorting phenomenon, where political persona predominates the direction and degree of LLM decisions. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.

Paper Structure

This paper contains 30 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Analysis setting for exploring the persona-dependent LLM alignment in the moral machine experiment introduced in awad2018moral.
  • Figure 2: Comparison between responses from humans awad2018moral and responses from GPT-4o, GPT-3.5, and Llama2. The gray-shaded radar for the Human class represents the aggregate results for all the participants. The gray-shaded radar for LLMs represents the baseline results when the scenario is prompted with no assigned persona.
  • Figure 3: Moral machine distance for different subgroups for human, ChatGPT-4o, GPT-3.5, and Llama 2 responses.
  • Figure 4: Comparison of moral preferences highlighted across nine dimensions for different models with assigned personas. A value of 0 on the y-axis indicates no preference between the two groups. The solid horizontal line represents the default setting without personas. Bars represent results for 14 personas per dimension, with red and blue pairs showing opposing personas. Human moral decisions do not flip across any persona, while GPT-4o, GPT-3.5, and Llama2 exhibit more dimensions preferences below 0, indicating opposing moral decisions to humans.
  • Figure 5: The percentage of decisions showing a shift from the human baseline for the studied LLMs. GPT-4o exhibits better alignment with humans, with a low percentage of misaligned decisions. In contrast, the other two LLMs show around 20 percent of moral decisions misaligned with human responses.
  • ...and 2 more figures