Table of Contents
Fetching ...

The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models

Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh

TL;DR

MoCoP addresses the challenge of maintaining moral coherence in evolving LLMs by offering a dataset-free, closed-loop framework for continuous ethical evaluation. It combines three analytical layers—lexical integrity, semantic risk estimation, and reasoning-based judgment modeling—within an autonomous system that generates, evaluates, and refines ethical scenarios without external supervision. The paper introduces a formal framework and a meta-analytic ethics engine to quantify moral drift, cross-model coherence, and temporal stability, demonstrated via GPT-4-Turbo and DeepSeek. Findings reveal a strong inverse relationship between ethics and toxicity and temporal invariance of moral reasoning, supporting MoCoP as a scalable tool for continuous auditing of computational morality.

Abstract

The rapid advancement and adaptability of Large Language Models (LLMs) highlight the need for moral consistency, the capacity to maintain ethically coherent reasoning across varied contexts. Existing alignment frameworks, structured approaches designed to align model behavior with human ethical and social norms, often rely on static datasets and post-hoc evaluations, offering limited insight into how ethical reasoning may evolve across different contexts or temporal scales. This study presents the Moral Consistency Pipeline (MoCoP), a dataset-free, closed-loop framework for continuously evaluating and interpreting the moral stability of LLMs. MoCoP combines three supporting layers: (i) lexical integrity analysis, (ii) semantic risk estimation, and (iii) reasoning-based judgment modeling within a self-sustaining architecture that autonomously generates, evaluates, and refines ethical scenarios without external supervision. Our empirical results on GPT-4-Turbo and DeepSeek suggest that MoCoP effectively captures longitudinal ethical behavior, revealing a strong inverse relationship between ethical and toxicity dimensions (correlation rET = -0.81, p value less than 0.001) and a near-zero association with response latency (correlation rEL approximately equal to 0). These findings demonstrate that moral coherence and linguistic safety tend to emerge as stable and interpretable characteristics of model behavior rather than short-term fluctuations. Furthermore, by reframing ethical evaluation as a dynamic, model-agnostic form of moral introspection, MoCoP offers a reproducible foundation for scalable, continuous auditing and advances the study of computational morality in autonomous AI systems.

The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models

TL;DR

MoCoP addresses the challenge of maintaining moral coherence in evolving LLMs by offering a dataset-free, closed-loop framework for continuous ethical evaluation. It combines three analytical layers—lexical integrity, semantic risk estimation, and reasoning-based judgment modeling—within an autonomous system that generates, evaluates, and refines ethical scenarios without external supervision. The paper introduces a formal framework and a meta-analytic ethics engine to quantify moral drift, cross-model coherence, and temporal stability, demonstrated via GPT-4-Turbo and DeepSeek. Findings reveal a strong inverse relationship between ethics and toxicity and temporal invariance of moral reasoning, supporting MoCoP as a scalable tool for continuous auditing of computational morality.

Abstract

The rapid advancement and adaptability of Large Language Models (LLMs) highlight the need for moral consistency, the capacity to maintain ethically coherent reasoning across varied contexts. Existing alignment frameworks, structured approaches designed to align model behavior with human ethical and social norms, often rely on static datasets and post-hoc evaluations, offering limited insight into how ethical reasoning may evolve across different contexts or temporal scales. This study presents the Moral Consistency Pipeline (MoCoP), a dataset-free, closed-loop framework for continuously evaluating and interpreting the moral stability of LLMs. MoCoP combines three supporting layers: (i) lexical integrity analysis, (ii) semantic risk estimation, and (iii) reasoning-based judgment modeling within a self-sustaining architecture that autonomously generates, evaluates, and refines ethical scenarios without external supervision. Our empirical results on GPT-4-Turbo and DeepSeek suggest that MoCoP effectively captures longitudinal ethical behavior, revealing a strong inverse relationship between ethical and toxicity dimensions (correlation rET = -0.81, p value less than 0.001) and a near-zero association with response latency (correlation rEL approximately equal to 0). These findings demonstrate that moral coherence and linguistic safety tend to emerge as stable and interpretable characteristics of model behavior rather than short-term fluctuations. Furthermore, by reframing ethical evaluation as a dynamic, model-agnostic form of moral introspection, MoCoP offers a reproducible foundation for scalable, continuous auditing and advances the study of computational morality in autonomous AI systems.

Paper Structure

This paper contains 29 sections, 22 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Schematic of the Architecture of the MoCoP Framework. The pipeline integrates lexical integrity analysis, semantic risk estimation, and reasoning-based judgment modeling in a continuous feedback loop for evaluating LLM moral consistency. Arrows indicate feedback flow across layers.
  • Figure 2: Safety category distribution across GPT-4-Turbo and DeepSeek models.
  • Figure 3: Distribution of ethical scores across models.
  • Figure 4: Comparison of ethical score stability across models.
  • Figure 5: Correlation between ethical and toxicity scores.
  • ...and 2 more figures