Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Alice Oh, Meeyoung Cha
TL;DR
This work investigates how persona-conditioned prompts influence large language model alignment with human moral judgments in moral machine scenarios. By constructing sociodemographic personas across seven categories and applying a standardized persona prompt, the authors compare LLM responses to human baselines using AMCE-derived profiles and define Moral Decision Distance (MDD) as the Euclidean separation between persona AMCE vectors. Across three prominent LLMs (GPT-4o, GPT-3.5, Llama2), results show substantial persona-driven shifts, with political personas producing the largest alignment deviations from human judgments, especially for GPT-4o. The study highlights ethical risks of deploying morally sensitive AI and calls for more robust, multidimensional persona settings and broader scenario exploration to ensure alignment with diverse human values.
Abstract
Deploying large language models (LLMs) with agency in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans when faced with moral dilemmas? This study examines the alignment between LLM-driven decisions and human judgment in various contexts of the moral machine experiment, including personas reflecting different sociodemographics. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. Our data also indicate an interesting partisan sorting phenomenon, where political persona predominates the direction and degree of LLM decisions. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.
