Table of Contents
Fetching ...

Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment

Karina Vida, Fabian Damken, Anne Lauscher

TL;DR

This work investigates whether multilingual LLMs exhibit moral biases and whether those biases align with human judgments. By extending the Moral Machine Experiment to ten languages and multiple models, it measures language-dependent AMCE profiles, RMSE, and MAB against human baselines. The findings show pervasive, language- and model-specific biases, with several models (notably Llama 3 70B-Instruct) showing strong, non-aligned preferences, while others approach randomness; biases are not consistently mapped to cultural dispositions. These results underscore the need for careful, language-aware prompting and highlight complexities in deploying morally consequential AI across diverse linguistic communities. The study provides a formal definition of moral bias and lays out a data-driven framework for cross-language ethical evaluation of LLMs, with implications for trust, safety, and governance of multilingual AI systems.

Abstract

Large language models (LLMs) increasingly find their way into the most diverse areas of our everyday lives. They indirectly influence people's decisions or opinions through their daily use. Therefore, understanding how and which moral judgements these LLMs make is crucial. However, morality is not universal and depends on the cultural background. This raises the question of whether these cultural preferences are also reflected in LLMs when prompted in different languages or whether moral decision-making is consistent across different languages. So far, most research has focused on investigating the inherent values of LLMs in English. While a few works conduct multilingual analyses of moral bias in LLMs in a multilingual setting, these analyses do not go beyond atomic actions. To the best of our knowledge, a multilingual analysis of moral bias in dilemmas has not yet been conducted. To address this, our paper builds on the moral machine experiment (MME) to investigate the moral preferences of five LLMs, Falcon, Gemini, Llama, GPT, and MPT, in a multilingual setting and compares them with the preferences collected from humans belonging to different cultures. To accomplish this, we generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take. Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves. Moreover, we find that almost all models, particularly Llama 3, divert greatly from human values and, for instance, prefer saving fewer people over saving more.

Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment

TL;DR

This work investigates whether multilingual LLMs exhibit moral biases and whether those biases align with human judgments. By extending the Moral Machine Experiment to ten languages and multiple models, it measures language-dependent AMCE profiles, RMSE, and MAB against human baselines. The findings show pervasive, language- and model-specific biases, with several models (notably Llama 3 70B-Instruct) showing strong, non-aligned preferences, while others approach randomness; biases are not consistently mapped to cultural dispositions. These results underscore the need for careful, language-aware prompting and highlight complexities in deploying morally consequential AI across diverse linguistic communities. The study provides a formal definition of moral bias and lays out a data-driven framework for cross-language ethical evaluation of LLMs, with implications for trust, safety, and governance of multilingual AI systems.

Abstract

Large language models (LLMs) increasingly find their way into the most diverse areas of our everyday lives. They indirectly influence people's decisions or opinions through their daily use. Therefore, understanding how and which moral judgements these LLMs make is crucial. However, morality is not universal and depends on the cultural background. This raises the question of whether these cultural preferences are also reflected in LLMs when prompted in different languages or whether moral decision-making is consistent across different languages. So far, most research has focused on investigating the inherent values of LLMs in English. While a few works conduct multilingual analyses of moral bias in LLMs in a multilingual setting, these analyses do not go beyond atomic actions. To the best of our knowledge, a multilingual analysis of moral bias in dilemmas has not yet been conducted. To address this, our paper builds on the moral machine experiment (MME) to investigate the moral preferences of five LLMs, Falcon, Gemini, Llama, GPT, and MPT, in a multilingual setting and compares them with the preferences collected from humans belonging to different cultures. To accomplish this, we generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take. Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves. Moreover, we find that almost all models, particularly Llama 3, divert greatly from human values and, for instance, prefer saving fewer people over saving more.
Paper Structure (35 sections, 9 figures, 8 tables)

This paper contains 35 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Illustration of a presentation of a moral dilemma to a large language model. Note that the images are illustrative, the actual prompt is textual. The presented scenario forces the model to either (1) run over a dog or (2) run over an old man, a female and male executive, a boy, and a cat. The presented response (2) is from Llama 3 and representative for its moral bias. Scenario renderings are taken from https://www.moralmachine.net/. The robot is from https://pixabay.com/vectors/character-creature-robot-2023874/.
  • Figure 2: Clustering of languages based on the AMCE. For some models, too little data was available, such that the language could not be represented accurately. The coloured hatching in the background of each plot denotes the primary cluster that we associate the language with according to the MME.
  • Figure 3: Moral bias of Llama 3 70B-Instruct. Each radial axis depicts one factor of the experiment.
  • Figure 4: Moral bias of Gemini 1.0 Pro; see \ref{['fig:radar_llama3_70b']} for more details.
  • Figure 5: Moral bias of GPT 3.5 Turbo; see \ref{['fig:radar_llama3_70b']} for more details.
  • ...and 4 more figures