Table of Contents
Fetching ...

The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas

Giovanni Franco Gabriel Marraffini, Andrés Cotton, Noe Fabian Hsueh, Axel Fridman, Juan Wisznia, Luciano Del Corro

TL;DR

The paper introduces the Greatest Good Benchmark (GGB), a utilitarian-focused evaluation of LLM moral judgments by adapting the Oxford Utilitarian Scale and expanding its dataset with bias-mitigating prompt variations. Across 15 diverse models, the study finds a robust pattern of strong rejection of instrumental harm and endorsement of impartial beneficence, with larger models more closely resembling lay judgments but not aligning with scholarly theories. The results reveal an ‘artificial morality’ in LLMs and highlight model size as a key moderator, offering actionable insights for future alignment work and dataset development. The work provides a transparent, reproducible framework and public data/code to advance understanding of LLM moral biases and their implications for real-world deployment.

Abstract

The question of how to make decisions that maximise the well-being of all persons is very relevant to design language models that are beneficial to humanity and free from harm. We introduce the Greatest Good Benchmark to evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis across 15 diverse LLMs reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards. Most LLMs have a marked preference for impartial beneficence and rejection of instrumental harm. These findings showcase the 'artificial moral compass' of LLMs, offering insights into their moral alignment.

The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas

TL;DR

The paper introduces the Greatest Good Benchmark (GGB), a utilitarian-focused evaluation of LLM moral judgments by adapting the Oxford Utilitarian Scale and expanding its dataset with bias-mitigating prompt variations. Across 15 diverse models, the study finds a robust pattern of strong rejection of instrumental harm and endorsement of impartial beneficence, with larger models more closely resembling lay judgments but not aligning with scholarly theories. The results reveal an ‘artificial morality’ in LLMs and highlight model size as a key moderator, offering actionable insights for future alignment work and dataset development. The work provides a transparent, reproducible framework and public data/code to advance understanding of LLM moral biases and their implications for real-world deployment.

Abstract

The question of how to make decisions that maximise the well-being of all persons is very relevant to design language models that are beneficial to humanity and free from harm. We introduce the Greatest Good Benchmark to evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis across 15 diverse LLMs reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards. Most LLMs have a marked preference for impartial beneficence and rejection of instrumental harm. These findings showcase the 'artificial moral compass' of LLMs, offering insights into their moral alignment.

Paper Structure

This paper contains 36 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: OUS results for professional philosophers that adhere to different moral theories and the Lay Population as reported by Kahane2018 with standard error bars.
  • Figure 2: Instruction example of the GGB.
  • Figure 3: Comparison of models, philosophical theories, and lay population with IB and IH mean values and standard errors.
  • Figure 4: Histogram of variance for each IH or IB and model
  • Figure 5: Plot of models with temperature 0 and the lay population located with the corresponding IB and IH mean values with their corresponding standard error.
  • ...and 1 more figures