Observer-Aware Probabilistic Planning Under Partial Observability
Salomé Lepers, Vincent Thomas, Olivier Buffet
TL;DR
This work extends observer-aware planning to partial observability via PO-OAMDPs, enabling dynamic target variables and controlling observer inference. It embeds BST belief updates to track observer state beliefs and derives observer-aware rewards that depend on the target belief, transforming PO-OAMDPs into an equivalent state-belief MDP. The authors adapt HSVI to operate in the information-space, address SSP/complexity considerations, and demonstrate nontrivial legibility, explicability, and predictability behaviors in maze-like benchmarks, with improved performance over baseline observer policies. This framework broadens realistic, human-aware planning by allowing the agent to strategically reveal or conceal information to an observer under uncertainty, with proven solution approaches and practicalizable initialization strategies.
Abstract
In this article, we are interested in planning problems where the agent is aware of the presence of an observer, and where this observer is in a partial observability situation. The agent has to choose its strategy so as to optimize the information transmitted by observations. Building on observer-aware Markov decision processes (OAMDPs), we propose a framework to handle this type of problems and thus formalize properties such as legibility, explicability and predictability. This extension of OAMDPs to partial observability can not only handle more realistic problems, but also permits considering dynamic hidden variables of interest. These dynamic target variables allow, for instance, working with predictability, or with legibility problems where the goal might change during execution. We discuss theoretical properties of PO-OAMDPs and, experimenting with benchmark problems, we analyze HSVI's convergence behavior with dedicated initializations and study the resulting strategies.
