Table of Contents
Fetching ...

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin King, Helen Meng

TL;DR

This work argues for a theory-guided, top-down framework to enable LLMs to perform moral reasoning grounded in well-established normative ethics and moral psychology theories. By prompting models with structured theory-guided instructions (TI) across Justice, Deontology, Utilitarianism, and Theory of Dyadic Morality, the authors demonstrate that LLMs can understand and adhere to different moral theories and align with human judgments on both normative and commonsense morality datasets. The study reveals that no single theory universally outperforms others; cross-culture variants (e.g., TDM-En) improve alignment, while data annotation quality and contextual richness significantly influence misalignment. Overall, the results advocate for explainable, flexible top-down approaches as a viable path forward for ethical AI, while highlighting dataset and model limitations that warrant further interdisciplinary work.

Abstract

Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. These approaches have been criticized for overgeneralizing the moral stances of a limited group of annotators and lacking explainability. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research. The theory-guided top-down framework can incorporate various moral theories. Our experiments demonstrate the effectiveness of the proposed framework on datasets derived from moral theories. Furthermore, we show the alignment between different moral theories and existing morality datasets. Our analysis exhibits the potential and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

TL;DR

This work argues for a theory-guided, top-down framework to enable LLMs to perform moral reasoning grounded in well-established normative ethics and moral psychology theories. By prompting models with structured theory-guided instructions (TI) across Justice, Deontology, Utilitarianism, and Theory of Dyadic Morality, the authors demonstrate that LLMs can understand and adhere to different moral theories and align with human judgments on both normative and commonsense morality datasets. The study reveals that no single theory universally outperforms others; cross-culture variants (e.g., TDM-En) improve alignment, while data annotation quality and contextual richness significantly influence misalignment. Overall, the results advocate for explainable, flexible top-down approaches as a viable path forward for ethical AI, while highlighting dataset and model limitations that warrant further interdisciplinary work.

Abstract

Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. These approaches have been criticized for overgeneralizing the moral stances of a limited group of annotators and lacking explainability. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research. The theory-guided top-down framework can incorporate various moral theories. Our experiments demonstrate the effectiveness of the proposed framework on datasets derived from moral theories. Furthermore, we show the alignment between different moral theories and existing morality datasets. Our analysis exhibits the potential and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.
Paper Structure (44 sections, 2 figures, 6 tables)

This paper contains 44 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Given a scenario, the results from the popular bottom-up approach (a) and the proposed theory-guided top-down approach (b) for moral judgment.
  • Figure 2: Error analysis result.