Table of Contents
Fetching ...

Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms

Yuxi Sun, Wei Gao, Hongzhan Lin, Jing Ma, Wenxuan Zhang

TL;DR

The paper tackles the challenge of explainable ethical assessment of human actions under conflicting social norms. It introduces ClarityEthic, a two-stage framework that (i) pre-trains task-specific language models for rationale, norm, and valence generation using LLM-derived rationales and human data, and (ii) fine-tunes these generators with a contrastive objective to align norm representations across related actions. The approach yields improved valence prediction and generates two-path norms and rationales that explain decisions on Moral Stories (MoSt) and ETHICS with strong automatic and human-evaluated evidence. The work highlights both the potential and limits of explicit normative reasoning, offering directions for broader cultural coverage and integration with diverse LLMs.

Abstract

Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action ``\textit{report a witnessed crime}" are social norms that inform our conduct, such as ``\textit{It is expected to be brave to report crimes}''. Current AI systems that assess valence (i.e., support or oppose) of human actions by leveraging large-scale data training not grounded on explicit norms may be difficult to explain, and thus untrustworthy. Emulating human assessors by considering social norms can help AI models better understand and predict valence. While multiple norms come into play, conflicting norms can create tension and directly influence human behavior. For example, when deciding whether to ``\textit{report a witnessed crime}'', one may balance \textit{bravery} against \textit{self-protection}. In this paper, we introduce \textit{ClarityEthic}, a novel ethical assessment approach, to enhance valence prediction and explanation by generating conflicting social norms behind human actions, which strengthens the moral reasoning capabilities of language models by using a contrastive learning strategy. Extensive experiments demonstrate that our method outperforms strong baseline approaches, and human evaluations confirm that the generated social norms provide plausible explanations for the assessment of human behaviors.

Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms

TL;DR

The paper tackles the challenge of explainable ethical assessment of human actions under conflicting social norms. It introduces ClarityEthic, a two-stage framework that (i) pre-trains task-specific language models for rationale, norm, and valence generation using LLM-derived rationales and human data, and (ii) fine-tunes these generators with a contrastive objective to align norm representations across related actions. The approach yields improved valence prediction and generates two-path norms and rationales that explain decisions on Moral Stories (MoSt) and ETHICS with strong automatic and human-evaluated evidence. The work highlights both the potential and limits of explicit normative reasoning, offering directions for broader cultural coverage and integration with diverse LLMs.

Abstract

Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action ``\textit{report a witnessed crime}" are social norms that inform our conduct, such as ``\textit{It is expected to be brave to report crimes}''. Current AI systems that assess valence (i.e., support or oppose) of human actions by leveraging large-scale data training not grounded on explicit norms may be difficult to explain, and thus untrustworthy. Emulating human assessors by considering social norms can help AI models better understand and predict valence. While multiple norms come into play, conflicting norms can create tension and directly influence human behavior. For example, when deciding whether to ``\textit{report a witnessed crime}'', one may balance \textit{bravery} against \textit{self-protection}. In this paper, we introduce \textit{ClarityEthic}, a novel ethical assessment approach, to enhance valence prediction and explanation by generating conflicting social norms behind human actions, which strengthens the moral reasoning capabilities of language models by using a contrastive learning strategy. Extensive experiments demonstrate that our method outperforms strong baseline approaches, and human evaluations confirm that the generated social norms provide plausible explanations for the assessment of human behaviors.

Paper Structure

This paper contains 42 sections, 5 equations, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Different social norms support or oppose everyday situations to varying degrees. ClarityEthic is designed to assess and explain how conflicting social norms may influence human behaviors.
  • Figure 2: We first elicit supporting and opposing rationales from LLMs, then the ClarityEthic is trained in two steps: 1) Pre-training three task-specific language models; 2) Fine-tuning the generators using contrastive learning. During inference, ClarityEthic predicts the valence of specific actions and generates corresponding two-path social norms and rationales to explain its ethical assessment.
  • Figure 3: Human evaluation on generated rationales by ChatGPT and ClarityEthic on MoSt.
  • Figure 4: An example of the user study questionnaire.