Table of Contents
Fetching ...

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization

Guangliang Liu, Zimo Qi, Xitong Zhang, Lei Jiang, Kristen Marie Johnson

TL;DR

The paper investigates whether current learning paradigms can equip LLMs with moral reasoning, arguing that a pragmatic dilemma—arising from the gap between distributional semantics and the pragmatic nature of morality—limits generalization. It analyzes three downstream tasks (Moral Foundations classification, Rule of Thumb generation, ethical judgment prediction) and compares to a semantics baseline (sentiment analysis), introducing the Representational Likelihood Algorithm (RLA) to link training similarity to unseen judgments. Fine-tuning experiments show gains largely tied to distributional similarities rather than intrinsic moral understanding, with mechanistic analysis confirming the persistent pragmatic barrier even when additional information is used. The work highlights fundamental constraints in current AI alignment approaches and suggests grounding and hybrid strategies to enable robust moral reasoning.

Abstract

Ensuring that Large Language Models (LLMs) return just responses which adhere to societal values is crucial for their broader application. Prior research has shown that LLMs often fail to perform satisfactorily on tasks requiring moral cognizance, such as ethics-based judgments. While current approaches have focused on fine-tuning LLMs with curated datasets to improve their capabilities on such tasks, choosing the optimal learning paradigm to enhance the ethical responses of LLMs remains an open research debate. In this work, we aim to address this fundamental question: can current learning paradigms enable LLMs to acquire sufficient moral reasoning capabilities? Drawing from distributional semantics theory and the pragmatic nature of moral discourse, our analysis indicates that performance improvements follow a mechanism similar to that of semantic-level tasks, and therefore remain affected by the pragmatic nature of morals latent in discourse, a phenomenon we name the pragmatic dilemma. We conclude that this pragmatic dilemma imposes significant limitations on the generalization ability of current learning paradigms, making it the primary bottleneck for moral reasoning acquisition in LLMs.

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization

TL;DR

The paper investigates whether current learning paradigms can equip LLMs with moral reasoning, arguing that a pragmatic dilemma—arising from the gap between distributional semantics and the pragmatic nature of morality—limits generalization. It analyzes three downstream tasks (Moral Foundations classification, Rule of Thumb generation, ethical judgment prediction) and compares to a semantics baseline (sentiment analysis), introducing the Representational Likelihood Algorithm (RLA) to link training similarity to unseen judgments. Fine-tuning experiments show gains largely tied to distributional similarities rather than intrinsic moral understanding, with mechanistic analysis confirming the persistent pragmatic barrier even when additional information is used. The work highlights fundamental constraints in current AI alignment approaches and suggests grounding and hybrid strategies to enable robust moral reasoning.

Abstract

Ensuring that Large Language Models (LLMs) return just responses which adhere to societal values is crucial for their broader application. Prior research has shown that LLMs often fail to perform satisfactorily on tasks requiring moral cognizance, such as ethics-based judgments. While current approaches have focused on fine-tuning LLMs with curated datasets to improve their capabilities on such tasks, choosing the optimal learning paradigm to enhance the ethical responses of LLMs remains an open research debate. In this work, we aim to address this fundamental question: can current learning paradigms enable LLMs to acquire sufficient moral reasoning capabilities? Drawing from distributional semantics theory and the pragmatic nature of moral discourse, our analysis indicates that performance improvements follow a mechanism similar to that of semantic-level tasks, and therefore remain affected by the pragmatic nature of morals latent in discourse, a phenomenon we name the pragmatic dilemma. We conclude that this pragmatic dilemma imposes significant limitations on the generalization ability of current learning paradigms, making it the primary bottleneck for moral reasoning acquisition in LLMs.

Paper Structure

This paper contains 18 sections, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Training and Development Accuracy Over 10 Fine-tuning Epochs. The first four figures display results for moral foundation classification tasks, while the rightmost figure shows the results for the SST benchmark.
  • Figure 2: Convergence Curve of Development Accuracy for Considered Classification Tasks. Only the development accuracy of SST increases with more training samples and finally approaches 1.0.
  • Figure 3: Top-10 generalization-supportive training samples analysis for fine-tuned Mistral with the SocialChem (upper two rows) and MIC (bottom two rows) benchmark.
  • Figure 4: Ratio of generalization-supportive training situations with the same underlying moral foudation as the test situation. Upper two subfigures are for SocialChem and the bottom sub-figures are for MIC. Top-50 situations are available in Appendix \ref{['app:top100']}.
  • Figure 5: Perplexity for Mistral. Baseline indicates the Perplexity of the LLMs without any fine-tuning.
  • ...and 5 more figures