Table of Contents
Fetching ...

"Guess what I'm doing": Extending legibility to sequential decision tasks

Miguel Faria, Francisco S. Melo, Ana Paiva

TL;DR

This work introduces PoLMDP, a tractable framework for generating legible policies in sequential decision tasks under uncertainty. By defining a legible reward $r_{\rm leg}$ via $P((x,a)|r_n)=\frac{\exp(\beta Q_n^*(x,a))}{\sum_m \exp(\beta Q_m^*(x,a))}$ and using a total-variation-like mechanism, PoLMDP yields policies that maintain high reward while disambiguating the intended goal, solved with standard MDP methods rather than theory-of-mind models. Across maze-based simulations and human-robot interaction studies, PoLMDP demonstrates faster, more reliable legibility than L-MDP, similar legibility levels, and improved effectiveness as demonstrations for IRL agents, as well as faster, more confident goal inference by humans. These results position PoLMDP as a practical, scalable approach to legible planning with broad potential in HRI, shared autonomy, and collaborative robotics, especially in large state spaces and multi-goal settings.

Abstract

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.

"Guess what I'm doing": Extending legibility to sequential decision tasks

TL;DR

This work introduces PoLMDP, a tractable framework for generating legible policies in sequential decision tasks under uncertainty. By defining a legible reward via and using a total-variation-like mechanism, PoLMDP yields policies that maintain high reward while disambiguating the intended goal, solved with standard MDP methods rather than theory-of-mind models. Across maze-based simulations and human-robot interaction studies, PoLMDP demonstrates faster, more reliable legibility than L-MDP, similar legibility levels, and improved effectiveness as demonstrations for IRL agents, as well as faster, more confident goal inference by humans. These results position PoLMDP as a practical, scalable approach to legible planning with broad potential in HRI, shared autonomy, and collaborative robotics, especially in large state spaces and multi-goal settings.

Abstract

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.
Paper Structure (25 sections, 16 equations, 13 figures)

This paper contains 25 sections, 16 equations, 13 figures.

Figures (13)

  • Figure 1: Example of maze-like environment with two goals, $A$ and $B$. The blue arrows indicate a possible action sequence following an optimal policy, while black arrows indicate an action sequence following a legible policy (which is also optimal).
  • Figure 2: Results for the PoLMDP legibility metric performance comparison between the PoLMDP framework against Miura's Legible MDP, when we vary the number of possible goals in a mazeworld like scenario.
  • Figure 3: Results for the Miura's Legible MDP legibility metric performance comparison between the PoLMDP framework against Miura's Legible MDP, when we vary the number of possible goals in a mazeworld like scenario.
  • Figure 4: Results for the time performance comparison between the PoLMDP framework against Miura's Legible MDP, when we vary the number of possible goals in a mazeworld like scenario. In continuous lines we show the average times, and, in dashed lines, the percentage of failed tests.
  • Figure 5: Results for the PoLMDP legibility metric performance comparison between the PoLMDP framework against Miura's Legible MDP, when we vary the number states in the mazeworld scenario.
  • ...and 8 more figures