Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making
Rohit K. Dubey, Damian Dailisan, Sachit Mahajan
TL;DR
This work tackles moral uncertainty in AI decision-making by introducing AMULED, a two-layer reinforcement learning framework that appends a task-agnostic ethical layer guided by large language models. It formalizes multiple moral perspectives into Basic Belief Assignments, fusing them with Belief Jensen–Shannon Divergence and Dempster–Shafer Theory to generate shaping rewards that balance primary goals with diverse ethical sub-goals. The approach is evaluated on two toy domains, Finding Milk and Driving and Rescuing, showing improved consistency and adaptability over handcrafted rewards and other belief-aggregation baselines, with GPT-4o-mini typically delivering the strongest performance. While promising for low-stakes, scalable ethical reasoning, the paper notes limitations in LLM spatial reasoning, potential biases, and the need for interpretability and human-in-the-loop validation for high-stakes deployments.
Abstract
We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. Following initial training, the RL model undergoes ethical fine-tuning, where human feedback is replaced by feedback generated from a large language model (LLM). The LLM embodies consequentialist, deontological, virtue, social justice, and care ethics as moral principles to assign belief values to recommended actions during ethical decision-making. An ethical layer aggregates belief scores from multiple LLM-derived moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward, steering the agent toward choices that align with a balanced ethical framework. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks. Our approach, tested across different LLM variants and compared with other belief aggregation techniques, demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards. This method is especially effective in dynamic scenarios where ethical challenges arise unexpectedly, making it well-suited for real-world applications.
