Table of Contents
Fetching ...

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry, Ruizhe Li, Vasu Sharma, Kevin Zhu, Sunishchal Dev

TL;DR

ProMoral-Bench presents a unified, model-agnostic benchmark to compare prompting strategies for moral reasoning and safety across four LLM families and four task datasets. It introduces a Unified Moral Safety Score (UMSS) that harmonizes moral competence and safety robustness, evaluated via ETHICS, Scruples, ETHICS-Contrast, and WildJailbreak under 11 prompting paradigms. Results show compact, exemplar-guided prompts consistently outperform complex multi-turn reasoning, delivering higher UMSS with lower token costs, while multi-turn pipelines are brittle under perturbations. The findings emphasize that prompting design, tailored to model biases, offers a practical path to principled, cost-effective alignment in high-stakes moral reasoning applications.

Abstract

Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain fragmented across datasets and models.We introduce ProMoral-Bench, a unified benchmark evaluating 11 prompting paradigms across four LLM families. Using ETHICS, Scruples, WildJailbreak, and our new robustness test, ETHICS-Contrast, we measure performance via our proposed Unified Moral Safety Score (UMSS), a metric balancing accuracy and safety. Our results show that compact, exemplar-guided scaffolds outperform complex multi-stage reasoning, providing higher UMSS scores and greater robustness at a lower token cost. While multi-turn reasoning proves fragile under perturbations, few-shot exemplars consistently enhance moral stability and jailbreak resistance. ProMoral-Bench establishes a standardized framework for principled, cost-effective prompt engineering.

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

TL;DR

ProMoral-Bench presents a unified, model-agnostic benchmark to compare prompting strategies for moral reasoning and safety across four LLM families and four task datasets. It introduces a Unified Moral Safety Score (UMSS) that harmonizes moral competence and safety robustness, evaluated via ETHICS, Scruples, ETHICS-Contrast, and WildJailbreak under 11 prompting paradigms. Results show compact, exemplar-guided prompts consistently outperform complex multi-turn reasoning, delivering higher UMSS with lower token costs, while multi-turn pipelines are brittle under perturbations. The findings emphasize that prompting design, tailored to model biases, offers a practical path to principled, cost-effective alignment in high-stakes moral reasoning applications.

Abstract

Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain fragmented across datasets and models.We introduce ProMoral-Bench, a unified benchmark evaluating 11 prompting paradigms across four LLM families. Using ETHICS, Scruples, WildJailbreak, and our new robustness test, ETHICS-Contrast, we measure performance via our proposed Unified Moral Safety Score (UMSS), a metric balancing accuracy and safety. Our results show that compact, exemplar-guided scaffolds outperform complex multi-stage reasoning, providing higher UMSS scores and greater robustness at a lower token cost. While multi-turn reasoning proves fragile under perturbations, few-shot exemplars consistently enhance moral stability and jailbreak resistance. ProMoral-Bench establishes a standardized framework for principled, cost-effective prompt engineering.
Paper Structure (46 sections, 1 figure, 21 tables)

This paper contains 46 sections, 1 figure, 21 tables.

Figures (1)

  • Figure 1: UMSS vs. Token Cost. Scatter plot of UMSS score against average tokens per example (log scale) for eleven prompting strategies. Green shading indicates optimal zone; dotted line marks efficiency threshold; dashed line shows LSRL.