Table of Contents
Fetching ...

Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

Paula Stocco, Suhas Chundi, Arec Jamgochian, Mykel J. Kochenderfer

TL;DR

This work introduces history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent that can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making.

Abstract

Lagrangian-guided Monte Carlo tree search with global dual ascent has been applied to solve large constrained partially observable Markov decision processes (CPOMDPs) online. In this work, we demonstrate that these global dual parameters can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making. To address this, we introduce history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent. We empirically compare the performance of our approach on a motivating toy example and two large CPOMDPs, demonstrating improved exploration, and ultimately, safer outcomes.

Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

TL;DR

This work introduces history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent that can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making.

Abstract

Lagrangian-guided Monte Carlo tree search with global dual ascent has been applied to solve large constrained partially observable Markov decision processes (CPOMDPs) online. In this work, we demonstrate that these global dual parameters can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making. To address this, we introduce history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent. We empirically compare the performance of our approach on a motivating toy example and two large CPOMDPs, demonstrating improved exploration, and ultimately, safer outcomes.
Paper Structure (11 sections, 3 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 3 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: A CPOMDP illustrating myopic decision making when guiding search with global dual variables. With a global $\lambda$, search explores either the cautious green belief or the budget violating red belief and misses the optimal, feasible blue belief.
  • Figure 2: Constrained Tiger history trees showing visitation count $N$. From left to right, actions nodes are listen noisily, open left door, and open right door, and observations nodes are tiger heard behind the right door and tiger not heard behind the right door. Local dual variables enable better exploration of the optimal action path, which necessitates listening before selecting actions.
  • Figure 3: Discounted results for Constrained LightDark. Average discounted returns and costs with error bars denoting standard error and constrained budget indicated in green.