Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

Paula Stocco; Suhas Chundi; Arec Jamgochian; Mykel J. Kochenderfer

Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

Paula Stocco, Suhas Chundi, Arec Jamgochian, Mykel J. Kochenderfer

TL;DR

This work introduces history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent that can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making.

Abstract

Lagrangian-guided Monte Carlo tree search with global dual ascent has been applied to solve large constrained partially observable Markov decision processes (CPOMDPs) online. In this work, we demonstrate that these global dual parameters can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making. To address this, we introduce history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent. We empirically compare the performance of our approach on a motivating toy example and two large CPOMDPs, demonstrating improved exploration, and ultimately, safer outcomes.

Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

TL;DR

Abstract

Paper Structure (11 sections, 3 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 3 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Background
CPOMDPs
Online Planning in CPOMDPs
Approach
Experiments
CPOMDP Problems
Search Efficacy
Performance
Conclusion
Acknowledgments

Figures (3)

Figure 1: A CPOMDP illustrating myopic decision making when guiding search with global dual variables. With a global $\lambda$, search explores either the cautious green belief or the budget violating red belief and misses the optimal, feasible blue belief.
Figure 2: Constrained Tiger history trees showing visitation count $N$. From left to right, actions nodes are listen noisily, open left door, and open right door, and observations nodes are tiger heard behind the right door and tiger not heard behind the right door. Local dual variables enable better exploration of the optimal action path, which necessitates listening before selecting actions.
Figure 3: Discounted results for Constrained LightDark. Average discounted returns and costs with error bars denoting standard error and constrained budget indicated in green.

Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

TL;DR

Abstract

Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent

Authors

TL;DR

Abstract

Table of Contents

Figures (3)