Table of Contents
Fetching ...

Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

Rohit K. Dubey, Damian Dailisan, Sachit Mahajan

TL;DR

This work tackles moral uncertainty in AI decision-making by introducing AMULED, a two-layer reinforcement learning framework that appends a task-agnostic ethical layer guided by large language models. It formalizes multiple moral perspectives into Basic Belief Assignments, fusing them with Belief Jensen–Shannon Divergence and Dempster–Shafer Theory to generate shaping rewards that balance primary goals with diverse ethical sub-goals. The approach is evaluated on two toy domains, Finding Milk and Driving and Rescuing, showing improved consistency and adaptability over handcrafted rewards and other belief-aggregation baselines, with GPT-4o-mini typically delivering the strongest performance. While promising for low-stakes, scalable ethical reasoning, the paper notes limitations in LLM spatial reasoning, potential biases, and the need for interpretability and human-in-the-loop validation for high-stakes deployments.

Abstract

We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. Following initial training, the RL model undergoes ethical fine-tuning, where human feedback is replaced by feedback generated from a large language model (LLM). The LLM embodies consequentialist, deontological, virtue, social justice, and care ethics as moral principles to assign belief values to recommended actions during ethical decision-making. An ethical layer aggregates belief scores from multiple LLM-derived moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward, steering the agent toward choices that align with a balanced ethical framework. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks. Our approach, tested across different LLM variants and compared with other belief aggregation techniques, demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards. This method is especially effective in dynamic scenarios where ethical challenges arise unexpectedly, making it well-suited for real-world applications.

Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

TL;DR

This work tackles moral uncertainty in AI decision-making by introducing AMULED, a two-layer reinforcement learning framework that appends a task-agnostic ethical layer guided by large language models. It formalizes multiple moral perspectives into Basic Belief Assignments, fusing them with Belief Jensen–Shannon Divergence and Dempster–Shafer Theory to generate shaping rewards that balance primary goals with diverse ethical sub-goals. The approach is evaluated on two toy domains, Finding Milk and Driving and Rescuing, showing improved consistency and adaptability over handcrafted rewards and other belief-aggregation baselines, with GPT-4o-mini typically delivering the strongest performance. While promising for low-stakes, scalable ethical reasoning, the paper notes limitations in LLM spatial reasoning, potential biases, and the need for interpretability and human-in-the-loop validation for high-stakes deployments.

Abstract

We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. Following initial training, the RL model undergoes ethical fine-tuning, where human feedback is replaced by feedback generated from a large language model (LLM). The LLM embodies consequentialist, deontological, virtue, social justice, and care ethics as moral principles to assign belief values to recommended actions during ethical decision-making. An ethical layer aggregates belief scores from multiple LLM-derived moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward, steering the agent toward choices that align with a balanced ethical framework. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks. Our approach, tested across different LLM variants and compared with other belief aggregation techniques, demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards. This method is especially effective in dynamic scenarios where ethical challenges arise unexpectedly, making it well-suited for real-world applications.

Paper Structure

This paper contains 36 sections, 7 equations, 6 figures.

Figures (6)

  • Figure 1: a) Schematic diagram of the AMULED framework, which uses Reinforcement Learning with [AI] Feedback. The small colored boxes show which blocks have state $s$ or reward $r$ values as inputs. b) LLM Prompt template used for the different Moral Clusters. c) Different ethical frameworks used as moral clusters. Pseudo-code of the framework is detailed in Algorithm \ref{['alg:pseudocode']}.
  • Figure 2: a) Metrics evaluating performance of the agent on the primary goal (left-most) and sub-goals. AMULED learns the ethical task to near-perfection, although a hand-crafted shaping reward performs the best for this environment. Error bands reflect 95% confidence intervals of the mean. b) AMULED is compared to the performance of agents prompted to act like pure moral clusters, and a "moral agent". These values are measured from 50 episodes each. c) Illustration of one of the trajectories learned by AMULED.
  • Figure 3: a) Metrics evaluating the performance of the agent on the primary goal (left-most) and sub-goals. AMULED manages the tradeoffs between its conflicting goals much better than the other baselines. Error bands reflect 95% confidence intervals of the mean. b) Comparison of AMULED with the performance of agents prompted to act like pure moral clusters, and a "moral agent". These values are measured from 50 episodes each. c) Illustration of the Driving and Rescuing environment.
  • Figure 4: Comparison of AMULED with the performance of agents trained with alternative belief aggregation functions. The two left panels are for FindMilk, while the two right panels are for Driving. These values are measured from 50 episodes each.
  • Figure 5: Performance of AMULED using different LLMs as the moral agents. The two left panels are for FindMilk, while the two right panels are for Driving. OpenAI's GPT-4o-mini performed the best among LLMs across the different metrics, which is why it was used to produce the results of the rest of the paper.
  • ...and 1 more figures