Table of Contents
Fetching ...

Previous Knowledge Utilization In Online Anytime Belief Space Planning

Michael Novitsky, Moran Barenboim, Vadim Indelman

TL;DR

This work tackles online planning under uncertainty in continuous, non-parametric POMDPs by reusing information from previous planning sessions. It introduces Incremental Reuse Particle Filter Tree (IR-PFT), which fuses MIS-based sample reuse with MCTS to accelerate decision-making while maintaining performance. The key contributions include an incremental MIS update theorem, MIS-based experience estimation from offline trajectories, and an anytime planning algorithm that extends reused propagated beliefs with horizon alignment. Empirical results on a 2D Light-Dark task show runtime speedups around 1.5x with negligible impact on accumulated reward, highlighting improved planning efficiency for uncertain environments.

Abstract

Online planning under uncertainty remains a critical challenge in robotics and autonomous systems. While tree search techniques are commonly employed to construct partial future trajectories within computational constraints, most existing methods discard information from previous planning sessions considering continuous spaces. This study presents a novel, computationally efficient approach that leverages historical planning data in current decision-making processes. We provide theoretical foundations for our information reuse strategy and introduce an algorithm based on Monte Carlo Tree Search (MCTS) that implements this approach. Experimental results demonstrate that our method significantly reduces computation time while maintaining high performance levels. Our findings suggest that integrating historical planning information can substantially improve the efficiency of online decision-making in uncertain environments, paving the way for more responsive and adaptive autonomous systems.

Previous Knowledge Utilization In Online Anytime Belief Space Planning

TL;DR

This work tackles online planning under uncertainty in continuous, non-parametric POMDPs by reusing information from previous planning sessions. It introduces Incremental Reuse Particle Filter Tree (IR-PFT), which fuses MIS-based sample reuse with MCTS to accelerate decision-making while maintaining performance. The key contributions include an incremental MIS update theorem, MIS-based experience estimation from offline trajectories, and an anytime planning algorithm that extends reused propagated beliefs with horizon alignment. Empirical results on a 2D Light-Dark task show runtime speedups around 1.5x with negligible impact on accumulated reward, highlighting improved planning efficiency for uncertain environments.

Abstract

Online planning under uncertainty remains a critical challenge in robotics and autonomous systems. While tree search techniques are commonly employed to construct partial future trajectories within computational constraints, most existing methods discard information from previous planning sessions considering continuous spaces. This study presents a novel, computationally efficient approach that leverages historical planning data in current decision-making processes. We provide theoretical foundations for our information reuse strategy and introduce an algorithm based on Monte Carlo Tree Search (MCTS) that implements this approach. Experimental results demonstrate that our method significantly reduces computation time while maintaining high performance levels. Our findings suggest that integrating historical planning information can substantially improve the efficiency of online decision-making in uncertain environments, paving the way for more responsive and adaptive autonomous systems.

Paper Structure

This paper contains 16 sections, 3 theorems, 30 equations, 4 figures, 4 algorithms.

Key Result

Theorem 1

Consider an MIS estimator mis_balance_h with $M$ different distributions and $n_m$ samples for each distribution $q_m \in \{q_1,..., q_M\}$. Given a batch of $L$ I.I.D samples from distribution $q_{m'}$, where $q_{m'}$ could be one of the existing distributions or a new, previously unseen distributi

Figures (4)

  • Figure 1: $\tau^i$ is a trajectory that was executed by an agent that followed policy $\pi$, $\tau^i_{suffix}$ is the part that we reuse from $\tau^i$ for the current belief $b_k$ and action $a_k$.
  • Figure 2: Illustration of reuse of three trajectories.
  • Figure 3: Illustration of horizon gap.
  • Figure 4: Light dark experiments comparing PFT and IR-PFT.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Corollary 1