Constrained Hierarchical Monte Carlo Belief-State Planning
Arec Jamgochian, Hugo Buurmeijer, Kyle H. Wray, Anthony Corso, Mykel J. Kochenderfer
TL;DR
This work addresses safe, online planning under state and transition uncertainty in CPOMDPs by introducing COBeTS, a belief-tree search over high-level options that leverages hierarchical decomposition to scale to large or continuous domains. By combining Monte Carlo search, progressive widening, and Lagrangian dual ascent within the options framework, COBeTS either guarantees constraint satisfaction when feasible options are available or guides the search toward safety, as demonstrated on four constrained CPOMDP domains including robotics. The key contributions are the COBeTS algorithm, its theoretical feasibility properties under local/global budget handling, and empirical demonstrations that hierarchical planning substantially outperforms non-hierarchical baselines in both reward and safety metrics. This approach offers a practical pathway to safe, scalable planning in real-time robotic systems where hard constraints must be satisfied despite uncertainty.
Abstract
Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.
