Safe Explicable Planning
Akkamahadevi Hanni, Andrew Boateng, Yu Zhang
TL;DR
The paper tackles safe explicable planning by addressing the tension between aligning AI behavior with human expectations and maintaining safety. It formalizes SEP over two MDPs, $\mathcal{M}_R$ and $\mathcal{M}_R^H$, with a safety bound $\delta$ and an explicability objective based on the human-believed return, yielding a Pareto set of safe explicable policies $\Pi^*_{\mathcal{E}}$. The authors propose an action-pruning step and two search algorithms—PDT+ (policy descent tree search) and PAG+ (policy ascent greedy search)—to compute the Pareto set, plus approximate state-aggregation methods for scalability, with formal proofs and empirical validation in simulations and a physical robot. Results show that SEP can produce safe, explicable behavior across domains while mitigating policy explosion, enabling safer human-robot collaboration without sacrificing explicability.
Abstract
Human expectations arise from their understanding of others and the world. In the context of human-AI interaction, this understanding may not align with reality, leading to the AI agent failing to meet expectations and compromising team performance. Explicable planning, introduced as a method to bridge this gap, aims to reconcile human expectations with the agent's optimal behavior, facilitating interpretable decision-making. However, an unresolved critical issue is ensuring safety in explicable planning, as it could result in explicable behaviors that are unsafe. To address this, we propose Safe Explicable Planning (SEP), which extends the prior work to support the specification of a safety bound. The goal of SEP is to find behaviors that align with human expectations while adhering to the specified safety criterion. Our approach generalizes the consideration of multiple objectives stemming from multiple models rather than a single model, yielding a Pareto set of safe explicable policies. We present both an exact method, guaranteeing finding the Pareto set, and a more efficient greedy method that finds one of the policies in the Pareto set. Additionally, we offer approximate solutions based on state aggregation to improve scalability. We provide formal proofs that validate the desired theoretical properties of these methods. Evaluation through simulations and physical robot experiments confirms the effectiveness of our approach for safe explicable planning.
