Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in
Utkarsh Agarwal, Kumar Tanmay, Aditi Khandelwal, Monojit Choudhury
TL;DR
This study extends Rao et al.'s multilingual ethical probing to GPT-4, ChatGPT, and Llama2-70B-Chat across six languages (English, Spanish, Russian, Chinese, Hindi, Swahili) to assess language-dependent moral reasoning. It introduces a multilingual ethical framework with prompts and Level 0/1/2 policies, defining ethical consistency via an entailment-like relation and evaluating with baseline and policy-driven prompts. The results show GPT-4 generally achieves the most consistent cross-language ethical reasoning, while ChatGPT and Llama2-70B-Chat display significant language-induced biases, especially in Hindi and Swahili; Level-2 policies tend to yield stronger performance across models. The work highlights cross-lingual value alignment challenges in LLMs and provides a methodology for multilingual ethical evaluation that can guide future alignment research and policy design. It also underscores the practical implications for deploying LLMs in culturally diverse settings, emphasizing translation biases and resource gaps as key areas for improvement.
Abstract
Ethical reasoning is a crucial skill for Large Language Models (LLMs). However, moral values are not universal, but rather influenced by language and culture. This paper explores how three prominent LLMs -- GPT-4, ChatGPT, and Llama2-70B-Chat -- perform ethical reasoning in different languages and if their moral judgement depend on the language in which they are prompted. We extend the study of ethical reasoning of LLMs by Rao et al. (2023) to a multilingual setup following their framework of probing LLMs with ethical dilemmas and policies from three branches of normative ethics: deontology, virtue, and consequentialism. We experiment with six languages: English, Spanish, Russian, Chinese, Hindi, and Swahili. We find that GPT-4 is the most consistent and unbiased ethical reasoner across languages, while ChatGPT and Llama2-70B-Chat show significant moral value bias when we move to languages other than English. Interestingly, the nature of this bias significantly vary across languages for all LLMs, including GPT-4.
