Table of Contents
Fetching ...

Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates

Colin Shea-Blymyer, Houssam Abbas

TL;DR

The paper introduces a formal method to enforce ethical obligations in reinforcement learning by extending Expected Act Utilitarian deontic logic ($EAU$) with strategic obligations. It provides a two-part toolkit: a model-checker that verifies whether a given policy satisfies obligations and a gradient-based policy updater that guides policies toward obligation satisfaction without changing the reward function. Using DAC-MDP abstractions and toy gridworlds (windy-drone) and a large CartPole DAC-MDP, the results show scalable verification and effective synthesis under $P_{\geq \rho}\phi$ constraints and related CTD structures. The work enables transparent safety trade-offs, guiding designers to specify normative constraints and refine policies accordingly.

Abstract

When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict. These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy. We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations. Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs. We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic. We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.

Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates

TL;DR

The paper introduces a formal method to enforce ethical obligations in reinforcement learning by extending Expected Act Utilitarian deontic logic () with strategic obligations. It provides a two-part toolkit: a model-checker that verifies whether a given policy satisfies obligations and a gradient-based policy updater that guides policies toward obligation satisfaction without changing the reward function. Using DAC-MDP abstractions and toy gridworlds (windy-drone) and a large CartPole DAC-MDP, the results show scalable verification and effective synthesis under constraints and related CTD structures. The work enables transparent safety trade-offs, guiding designers to specify normative constraints and refine policies accordingly.

Abstract

When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict. These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy. We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations. Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs. We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic. We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.
Paper Structure (22 sections, 10 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 10 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: The windy-drone MDP. Darkened cells are inaccessible states. The goal state is an absorbing state. An agent in this MDP has 4 actions available to it in any state: up, down, left, and right. A chosen action has a 70% chance of success, and on a failure the agent "slips" in one of the unchosen directions. An action result that would move the agent into a wall, or other inaccessible state, leaves the agent in the state it acted from.
  • Figure 2: An EAU model for agent $\alpha$, showing moments $m < m'$ with histories $H_m = \{h_1, \dots, h_5\}$, and $H_{m'} = \{h_1, h_2, h_3\}$. The actions available in moment $m$ are $Choice_{\alpha}^{m}=\{K_1, K_2\}$, and in $m'$ are $Choice_{\alpha}^{m'}=\{K_3, K_4\}$. Action $K_1 = \{h_1, h_2, h_3\}$, $K_2 = \{h_4, h_5\}$, $K_3 = \{h_1\}$, and $K_4 = \{h_2, h_3\}$. Each history is labeled with the formula(s) it satisfies, and its values $Value(h)$; e.g., $h_1$ satisfies $A$ and has a value of 7. The probability of an action being able to effect a moment or history is also given; e.g. $Pr_{\alpha}(m' | m) = 1.0$, and $Pr_{\alpha}(h_2 | m') = 0.7$. The index $m/h_4 \models [\alpha \, cstit\!:B]$ since $Choice_{\alpha}^{m}(h_4) = K_2 = \{h_4, h_5\}$, and both $h_4$ and $h_5$ satisfy $B$. However, $m/h_4 \not\models [\alpha \, cstit\!:H]$ because $h_5$ does not satisfy $H$. Still, $m/h_4 \models [\alpha \, cstit\!:P_{\geq 0.5}[H]]$ since $Pr_{\alpha}(h_4 | m) \geq 0.5$. And $m/h_2 \models [\alpha \, cstit\!:P_{\leq 0.7}[H]]$ because $Choice_{\alpha}^{m}(h_2) = \{h_1, h_2, h_3\}$, $h_2$ is the only history among those that satisfies $H$ and $Pr_{\alpha}(h_2 | m) \leq 0.7$. The $Q(K_2) = 7.5$, while $Q(K_1) = 1.0 * \max \{Q(K_3), Q(K_4)\} = \max \{7.0, 6.8\} = 7.0$, so $E- Optimal_{\alpha}^{m} = \{K_2\}$. Hence $m/h_4 \models \otimes[\alpha \, cstit\!:B]$. Finally, $E- Optimal_{\alpha}^{m'} = \{K_3\}$, so $m'/h_2 \not\models \otimes[\alpha \, cstit\!:B]$.
  • Figure 3: Gradient-based constrained policy search methods. The gradient of the expected utility of a system with respect to its policy (the utility gradient) is labeled $\nabla_\pi V$ and is red. This gradient points the policy in the direction that increases its expected utility. The gradient of the probability that the policy satisfies the specification with respect to its policy (the probability gradient) is labeled $\nabla_\pi f$ and is blue. This gradient points the policy in the direction that increases the probability with which it satisfies the given obligation. The policy that maximizes this probability is labeled $\pi^\varphi$, and the policy that maximizes expected utility is labeled $\pi^*$.
  • Figure 4: Line search experiment --- Expected utility (red) starts fairly high, but decreases as satisfaction probability (blue) decreases. It rebounds, however, as the policy moves closer to the reward optimal policy (at update 100). The satisfaction probability begins near 1.0, but decreases to near 0.
  • Figure 5: Average gradient experiment --- Expected utility (red) starts high, but decreases as satisfaction probability (blue) increases. The satisfaction probability increases to be near 1.0, while the expected utility drops slightly below 110.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Definition 1: Expected Ought
  • Definition 2: Strategic stit
  • Definition 3: Expected Strategic Ought