Safe Explicable Planning

Akkamahadevi Hanni; Andrew Boateng; Yu Zhang

Safe Explicable Planning

Akkamahadevi Hanni, Andrew Boateng, Yu Zhang

TL;DR

The paper tackles safe explicable planning by addressing the tension between aligning AI behavior with human expectations and maintaining safety. It formalizes SEP over two MDPs, $\mathcal{M}_R$ and $\mathcal{M}_R^H$, with a safety bound $\delta$ and an explicability objective based on the human-believed return, yielding a Pareto set of safe explicable policies $\Pi^*_{\mathcal{E}}$. The authors propose an action-pruning step and two search algorithms—PDT+ (policy descent tree search) and PAG+ (policy ascent greedy search)—to compute the Pareto set, plus approximate state-aggregation methods for scalability, with formal proofs and empirical validation in simulations and a physical robot. Results show that SEP can produce safe, explicable behavior across domains while mitigating policy explosion, enabling safer human-robot collaboration without sacrificing explicability.

Abstract

Human expectations arise from their understanding of others and the world. In the context of human-AI interaction, this understanding may not align with reality, leading to the AI agent failing to meet expectations and compromising team performance. Explicable planning, introduced as a method to bridge this gap, aims to reconcile human expectations with the agent's optimal behavior, facilitating interpretable decision-making. However, an unresolved critical issue is ensuring safety in explicable planning, as it could result in explicable behaviors that are unsafe. To address this, we propose Safe Explicable Planning (SEP), which extends the prior work to support the specification of a safety bound. The goal of SEP is to find behaviors that align with human expectations while adhering to the specified safety criterion. Our approach generalizes the consideration of multiple objectives stemming from multiple models rather than a single model, yielding a Pareto set of safe explicable policies. We present both an exact method, guaranteeing finding the Pareto set, and a more efficient greedy method that finds one of the policies in the Pareto set. Additionally, we offer approximate solutions based on state aggregation to improve scalability. We provide formal proofs that validate the desired theoretical properties of these methods. Evaluation through simulations and physical robot experiments confirms the effectiveness of our approach for safe explicable planning.

Safe Explicable Planning

TL;DR

The paper tackles safe explicable planning by addressing the tension between aligning AI behavior with human expectations and maintaining safety. It formalizes SEP over two MDPs,

and

, with a safety bound

and an explicability objective based on the human-believed return, yielding a Pareto set of safe explicable policies

. The authors propose an action-pruning step and two search algorithms—PDT+ (policy descent tree search) and PAG+ (policy ascent greedy search)—to compute the Pareto set, plus approximate state-aggregation methods for scalability, with formal proofs and empirical validation in simulations and a physical robot. Results show that SEP can produce safe, explicable behavior across domains while mitigating policy explosion, enabling safer human-robot collaboration without sacrificing explicability.

Abstract

Paper Structure (17 sections, 4 theorems, 6 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 4 theorems, 6 equations, 6 figures, 1 table, 2 algorithms.

Introduction
RELATED WORK
PROBLEM FORMULATION
SAFE EXPLICABLE PLANNING
Policy Space Reduction via Action Pruning
Policy Descent Tree Search (PDT)
Policy Ascent Greedy Search (PAG)
Approximate Solution via State Aggregation
EVALUATION
Simulations
Domain Descriptions:
Results:
Physical Robot Experiment
Robot Assistant Domain:
Results:
...and 2 more sections

Key Result

Lemma 1

The set of policies after pruning actions based on Eqn. eq:Abar is a superset of the set of policies that satisfy the constraint in Eqn. eq:exp_mdp, i.e., $\widetilde{\Pi} \supseteq \Pi_{\delta}$.

Figures (6)

Figure 1: The agent uses the ground-truth model - $\mathcal{M}_R$, an estimation of the human's understanding of it - $\mathcal{M}_R^H$, and a bound - $\delta$, to generate safe explicable policies $\Pi^*_\mathcal{E}$.
Figure 2: (a) Relationship between different set of policies. (b) PDT vs. PAG on pruned-action space $\widetilde{\Pi}$. The black nodes are expanded by PDT in descending order of state values under $\mathcal{M}_R$. The blue nodes are expanded by PAG in ascending order under $\mathcal{M}_R^H$. Solid lines represent single-action policy updates and dashed links represent multi-action updates.
Figure 3: Behavior comparison in the wumpus world. Black lines show the trajectories of the wumpus. Red line segments show the parts of the agent's trajectories when the wumpus is in an adjacent cell, and green line segments show when the wumpus is at least two steps away. Presented are the most likely trajectories by (a) the optimal agent's policy, (b) the human's expectation, and the safe explicable policies obtained when $\delta = 0.90$ by (c) PDT+ and (d) PAG+, respectively.
Figure 4: Behavior comparison in the large cliff world. Grey areas is the cliff and G is the goal. Reward for each state is shown at the top right corner. Displayed are the most likely trajectories from policies: (a) the optimal policy under $\mathcal{M}_R$, (b) the optimal policy under $\mathcal{M}_R^H$ (i.e., human expectation), (c) the safe explicable policies returned by PDT+ (green) and PAG+ (blue).
Figure 5: Pareto set obtained by PDT+ when $\delta=0.90$ and their corresponding $V$ values in $\mathcal{M}_R^H$, in the small cliff world. Values highlighted in red are those that result in non-dominated policies. (b) shows the policy obtained by PAG+.
...and 1 more figures

Theorems & Definitions (5)

Definition 1
Lemma 1
Lemma 2
Theorem 1
Theorem 2

Safe Explicable Planning

TL;DR

Abstract

Safe Explicable Planning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)