Table of Contents
Fetching ...

Contextual Moral Value Alignment Through Context-Based Aggregation

Pierre Dognin, Jesus Rios, Ronny Luss, Inkit Padhi, Matthew D Riemer, Miao Liu, Prasanna Sattigeri, Manish Nagireddy, Kush R. Varshney, Djallel Bouneffouf

TL;DR

CMVA tackles context-dependent ethics in LLMs by aggregating outputs from multiple Moral Value Agents guided by a user’s Moral Profile Vector $c$. The CMVA-GS framework casts moral alignment as a Multi-Objective Reinforcement Learning (MORL) problem with a reward vector $r$ in $R^N$ and a contextual objective $J_c(pi)$, solved by training per-value agents and a Contextual Aggregator. Each Moral Agent is trained via PPO-based RL with KL regularization to prevent drift. Evaluations on the Moral Integrity Corpus show CMVA-GS achieving higher alignment scores, illustrating a practical path toward context-aware, multi-value AI systems.

Abstract

Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. In this paper, we propose a system that does contextual moral value alignment based on contextual aggregation. Here, aggregation is defined as the process of integrating a subset of LLM responses that are best suited to respond to a user input, taking into account features extracted from the user's input. The proposed system shows better results in term of alignment to human value compared to the state of the art.

Contextual Moral Value Alignment Through Context-Based Aggregation

TL;DR

CMVA tackles context-dependent ethics in LLMs by aggregating outputs from multiple Moral Value Agents guided by a user’s Moral Profile Vector . The CMVA-GS framework casts moral alignment as a Multi-Objective Reinforcement Learning (MORL) problem with a reward vector in and a contextual objective , solved by training per-value agents and a Contextual Aggregator. Each Moral Agent is trained via PPO-based RL with KL regularization to prevent drift. Evaluations on the Moral Integrity Corpus show CMVA-GS achieving higher alignment scores, illustrating a practical path toward context-aware, multi-value AI systems.

Abstract

Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. In this paper, we propose a system that does contextual moral value alignment based on contextual aggregation. Here, aggregation is defined as the process of integrating a subset of LLM responses that are best suited to respond to a user input, taking into account features extracted from the user's input. The proposed system shows better results in term of alignment to human value compared to the state of the art.
Paper Structure (8 sections, 2 equations, 2 figures, 1 table)

This paper contains 8 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Contextual Moral-Value Alignment Generative System
  • Figure 2: Evaluation using ROUGE on MIC.