Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

Jiahao Wang; Songkai Xue; Jinghui Li; Xiaozhen Wang

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

Jiahao Wang, Songkai Xue, Jinghui Li, Xiaozhen Wang

TL;DR

This paper confronts the challenge of aligning large language models with diverse human values across cultures. It introduces a theory-grounded five-step ethical reasoning paradigm—Gather Facts, Identify Social Norms, Generate Options, Evaluate Options, and Reflect—embedded with four ethical lenses (Deontology, Common Good, Utilitarianism, Justice) to enable deliberative, culturally aware judgments. The framework can be implemented via prompt engineering or supervised fine-tuning, with evaluations on the SafeWorld benchmark showing significant gains in Norm Identification ($S_{norm}$) and Value Alignment ($S_{align}$) over baselines. The work lays a principled foundation for interpretable, robust, and scalable alignment of LLMs to global value pluralism, and outlines concrete directions for improving social-norm knowledge and evaluation of ethical reasoning processes.

Abstract

Ensuring that Large Language Models (LLMs) align with the diverse and evolving human values across different regions and cultures remains a critical challenge in AI ethics. Current alignment approaches often yield superficial conformity rather than genuine ethical understanding, failing to address the complex, context-dependent nature of human values. In this paper, we propose a novel ethical reasoning paradigm for LLMs inspired by well-established ethical decision-making models, aiming at enhancing diverse human value alignment through deliberative ethical reasoning. Our framework consists of a structured five-step process, including contextual fact gathering, hierarchical social norm identification, option generation, multiple-lens ethical impact analysis, and reflection. This theory-grounded approach guides LLMs through an interpretable reasoning process that enhances their ability to understand regional specificities and perform nuanced ethical analysis, which can be implemented with either prompt engineering or supervised fine-tuning methods. We perform evaluations on the SafeWorld benchmark that specially designed for regional value alignment. Experimental results demonstrate our framework significantly improves LLM alignment with diverse human values compared to baseline methods, enabling more accurate social norm identification and more culturally appropriate reasoning. Our work provides a concrete pathway toward developing LLMs that align more effectively with the multifaceted values of global societies through interdisciplinary research.

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

TL;DR

Abstract

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)