Table of Contents
Fetching ...

Vectorized Online POMDP Planning

Marcus Hoerger, Muhammad Sudrajat, Hanna Kurniawati

TL;DR

The paper introduces VOPP, a fully vectorized online POMDP planner that runs entirely on GPUs by representing the belief tree as tensors and executing forward search and backups as batched, dependency-free operations. Building on PORPP, VOPP analytically solves parts of the optimization and focuses numerical effort on expectation estimation, enabling massive data-parallelism. Empirical results show VOPP achieving at least 20x (and in some cases over 100x) speedups over HyP-DESPOT across large state, action, and observation spaces, while maintaining or improving policy quality. The work demonstrates scalability to complex robotics scenarios, including crowd navigation, and releases the implementation as open source.

Abstract

Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization of today's hardware, but parallelizing POMDP solvers has been challenging. They rely on interleaving numerical optimization over actions with the estimation of their values, which creates dependencies and synchronization bottlenecks between parallel processes that can quickly offset the benefits of parallelization. In this paper, we propose Vectorized Online POMDP Planner (VOPP), a novel parallel online solver that leverages a recent POMDP formulation that analytically solves part of the optimization component, leaving only the estimation of expectations for numerical computation. VOPP represents all data structures related to planning as a collection of tensors and implements all planning steps as fully vectorized computations over this representation. The result is a massively parallel solver with no dependencies and synchronization bottlenecks between parallel computations. Experimental results indicate that VOPP is at least 20X more efficient in computing near-optimal solutions compared to an existing state-of-the-art parallel online solver.

Vectorized Online POMDP Planning

TL;DR

The paper introduces VOPP, a fully vectorized online POMDP planner that runs entirely on GPUs by representing the belief tree as tensors and executing forward search and backups as batched, dependency-free operations. Building on PORPP, VOPP analytically solves parts of the optimization and focuses numerical effort on expectation estimation, enabling massive data-parallelism. Empirical results show VOPP achieving at least 20x (and in some cases over 100x) speedups over HyP-DESPOT across large state, action, and observation spaces, while maintaining or improving policy quality. The work demonstrates scalability to complex robotics scenarios, including crowd navigation, and releases the implementation as open source.

Abstract

Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization of today's hardware, but parallelizing POMDP solvers has been challenging. They rely on interleaving numerical optimization over actions with the estimation of their values, which creates dependencies and synchronization bottlenecks between parallel processes that can quickly offset the benefits of parallelization. In this paper, we propose Vectorized Online POMDP Planner (VOPP), a novel parallel online solver that leverages a recent POMDP formulation that analytically solves part of the optimization component, leaving only the estimation of expectations for numerical computation. VOPP represents all data structures related to planning as a collection of tensors and implements all planning steps as fully vectorized computations over this representation. The result is a massively parallel solver with no dependencies and synchronization bottlenecks between parallel computations. Experimental results indicate that VOPP is at least 20X more efficient in computing near-optimal solutions compared to an existing state-of-the-art parallel online solver.

Paper Structure

This paper contains 16 sections, 6 equations, 3 figures, 2 tables, 3 algorithms.

Figures (3)

  • Figure 1: Illustration of the two vectorized main operations -- forward search (a) and preference backup (b) -- of VOPP. Blue circles represent belief nodes, while yellow squares represent action nodes. The green lines represent sampled episodes. (a) Vectorized forward search: VOPP samples an action for each episode from the belief nodes $\mathbf{B}\xspace_{d}$ at depth $d$ in parallel and collects the sampled actions in the action tensor $\mathbf{A}\xspace_{\text{sampled}}$. It then performs a vectorized forward simulation of the episodes from one step using the generative model $G$ and $\mathbf{A}\xspace_{\text{sampled}}$. For the resulting observations, VOPP appends new belief nodes to $\mathbf{B}$ if they do not exist yet. The search then continues from the belief nodes at depth $d+1$ that the episodes visit. (b) Vectorized preference backup: For all belief nodes $\mathbf{B}\xspace_d$ at depth $d$, VOPP updates the preference values $\mathbf{\Psi}\xspace$ of their parent actions in single vectorized step. The updated preference values are then used to compute the belief values of all beliefs $\mathbf{B}\xspace_{d-1}$ at depth $d-1$ in one vectorized step, before the backup continues from $d-1$.
  • Figure 2: The problem scenarios used to evaluate VOPP.
  • Figure 3: Two partial trajectories of the Stretch 3 mobile robot in the CrowdNav scenario with $p_{\text{curious}} = 0.0$ (top) and $p_{\textit{curious}} = 1.0$ (bottom) at different time steps. Nearby people are colored according to their inferred character trait. Darker red tones indicate a higher probability of a person being curious.