Constrained Hierarchical Monte Carlo Belief-State Planning

Arec Jamgochian; Hugo Buurmeijer; Kyle H. Wray; Anthony Corso; Mykel J. Kochenderfer

Constrained Hierarchical Monte Carlo Belief-State Planning

Arec Jamgochian, Hugo Buurmeijer, Kyle H. Wray, Anthony Corso, Mykel J. Kochenderfer

TL;DR

This work addresses safe, online planning under state and transition uncertainty in CPOMDPs by introducing COBeTS, a belief-tree search over high-level options that leverages hierarchical decomposition to scale to large or continuous domains. By combining Monte Carlo search, progressive widening, and Lagrangian dual ascent within the options framework, COBeTS either guarantees constraint satisfaction when feasible options are available or guides the search toward safety, as demonstrated on four constrained CPOMDP domains including robotics. The key contributions are the COBeTS algorithm, its theoretical feasibility properties under local/global budget handling, and empirical demonstrations that hierarchical planning substantially outperforms non-hierarchical baselines in both reward and safety metrics. This approach offers a practical pathway to safe, scalable planning in real-time robotic systems where hard constraints must be satisfied despite uncertainty.

Abstract

Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.

Constrained Hierarchical Monte Carlo Belief-State Planning

TL;DR

Abstract

Paper Structure (19 sections, 3 theorems, 3 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 3 theorems, 3 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Background
CPOMDPs
Hierarchical Planning
Methodology
Preliminaries
Constrained Options Belief-Tree Search (COBeTS)
Maintaining Feasibility Anytime with Options
Experiments
CPOMDP Problems and Option Policies
Constrained LightDark jamgochian2023online (C, D, C)
Constrained Spillpoint jamgochian2023online (C, C, C)
Constrained Bumper Roomba (C, D, D)
Constrained Lidar Roomba (C, D, C)
Experiments and Discussion
...and 4 more sections

Key Result

Proposition 1

For all policies $\pi$, the reward value functions and cost value functions of the belief-state CSMDP are equal to those of the CPOSMDP, that is, for all $b\in\tilde{\mathcal{S}}, \tilde{V}^\pi_R(b) = V^\pi_R(b)$ and $\tilde{ \tl_map_inline:nn{V} { \tl_if_in:VnTF ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghi

Figures (3)

Figure 1: In CPFT-DPW jamgochian2023online (left), progressive widening (blue) is used to limit the branching factor of the Monte Carlo belief-state search tree, while dual parameters (red) are optimized to guide the search towards constraint satisfaction. COBeTS (right), leverages a hierarchy to decompose the partially observable planning problem, resulting in a search tree over options and semi-Markov belief transitions, with potentially far fewer nodes.
Figure 2: Mean cumulative discounted rewards (above) and costs (below) vs. number of tree queries across 50 Constrained LightDark simulations when using COBeTS with feasible options. COBeTS stays safe anytime while CPFT-DPW only satisfies constraints in the limit.
Figure 3: Mean cumulative discounted rewards for different numbers of options averaged over 50 Constrained LightDark simulations. All costs are feasible (not shown). COBeTS can retain high reward at larger action branching factors because hierarchy induces a smaller overall tree size.

Theorems & Definitions (8)

Proposition 1
Definition 1
Definition 2
Definition 3
Proposition 2
proof
Proposition 3
proof

Constrained Hierarchical Monte Carlo Belief-State Planning

TL;DR

Abstract

Constrained Hierarchical Monte Carlo Belief-State Planning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)