Table of Contents
Fetching ...

Observer-Aware Probabilistic Planning Under Partial Observability

Salomé Lepers, Vincent Thomas, Olivier Buffet

TL;DR

This work extends observer-aware planning to partial observability via PO-OAMDPs, enabling dynamic target variables and controlling observer inference. It embeds BST belief updates to track observer state beliefs and derives observer-aware rewards that depend on the target belief, transforming PO-OAMDPs into an equivalent state-belief MDP. The authors adapt HSVI to operate in the information-space, address SSP/complexity considerations, and demonstrate nontrivial legibility, explicability, and predictability behaviors in maze-like benchmarks, with improved performance over baseline observer policies. This framework broadens realistic, human-aware planning by allowing the agent to strategically reveal or conceal information to an observer under uncertainty, with proven solution approaches and practicalizable initialization strategies.

Abstract

In this article, we are interested in planning problems where the agent is aware of the presence of an observer, and where this observer is in a partial observability situation. The agent has to choose its strategy so as to optimize the information transmitted by observations. Building on observer-aware Markov decision processes (OAMDPs), we propose a framework to handle this type of problems and thus formalize properties such as legibility, explicability and predictability. This extension of OAMDPs to partial observability can not only handle more realistic problems, but also permits considering dynamic hidden variables of interest. These dynamic target variables allow, for instance, working with predictability, or with legibility problems where the goal might change during execution. We discuss theoretical properties of PO-OAMDPs and, experimenting with benchmark problems, we analyze HSVI's convergence behavior with dedicated initializations and study the resulting strategies.

Observer-Aware Probabilistic Planning Under Partial Observability

TL;DR

This work extends observer-aware planning to partial observability via PO-OAMDPs, enabling dynamic target variables and controlling observer inference. It embeds BST belief updates to track observer state beliefs and derives observer-aware rewards that depend on the target belief, transforming PO-OAMDPs into an equivalent state-belief MDP. The authors adapt HSVI to operate in the information-space, address SSP/complexity considerations, and demonstrate nontrivial legibility, explicability, and predictability behaviors in maze-like benchmarks, with improved performance over baseline observer policies. This framework broadens realistic, human-aware planning by allowing the agent to strategically reveal or conceal information to an observer under uncertainty, with proven solution approaches and practicalizable initialization strategies.

Abstract

In this article, we are interested in planning problems where the agent is aware of the presence of an observer, and where this observer is in a partial observability situation. The agent has to choose its strategy so as to optimize the information transmitted by observations. Building on observer-aware Markov decision processes (OAMDPs), we propose a framework to handle this type of problems and thus formalize properties such as legibility, explicability and predictability. This extension of OAMDPs to partial observability can not only handle more realistic problems, but also permits considering dynamic hidden variables of interest. These dynamic target variables allow, for instance, working with predictability, or with legibility problems where the goal might change during execution. We discuss theoretical properties of PO-OAMDPs and, experimenting with benchmark problems, we analyze HSVI's convergence behavior with dedicated initializations and study the resulting strategies.

Paper Structure

This paper contains 51 sections, 6 theorems, 13 equations, 13 figures, 1 table, 1 algorithm.

Key Result

Proposition 0

Any OAMDP $\mathcal{M}$ with BST belief update can be turned into an equivalent PO-OAMDP $\mathcal{M}'$, i.e., such that an optimal solution to one problem is optimal for the other problem.

Figures (13)

  • Figure 1: An OAMDP agent (3) assumes that the observer expects (2) the agent to behave so as to achieve some task (1).
  • Figure 2: PO-OAMDP trajectories and corresponding belief evolutions for the legibility task with $p_\text{\sc obs}=1$ (so that the evolution is deterministic) for two different goal cells
  • Figure 3: PO-OAMDP trajectories and corresponding belief evolutions for the legibility task with $p_\text{\sc obs}=1$ and $p_\text{\sc obs}=0.5$ (in this last case, only a sampled belief evolution---in which the agent has been observed in $(D,5)$---is shown).
  • Figure 4: PO-OAMDP trajectory and corresponding belief evolution for the explicability task with $p_\text{\sc obs}=1$ (so that the evolution is deterministic) for the left goal cell
  • Figure 5: PO-OAMDP trajectories and corresponding belief evolutions for predictability tasks
  • ...and 8 more figures

Theorems & Definitions (10)

  • Proposition 0
  • Proposition 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Corollary 4
  • Proposition 4
  • proof