Table of Contents
Fetching ...

Anytime Probabilistically Constrained Provably Convergent Online Belief Space Planning

Andrey Zhitnikov, Vadim Indelman

TL;DR

This article presents an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains in terms of states, actions, and observations and general-belief-dependent reward and payoff operators and proves convergence in probability with an exponential rate of a version of the algorithms.

Abstract

Taking into account future risk is essential for an autonomously operating robot to find online not only the best but also a safe action to execute. In this paper, we build upon the recently introduced formulation of probabilistic belief-dependent constraints. We present an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains. Unlike previous approaches, our method assures safety anytime with respect to the currently expanded search tree without relying on the convergence of the search. We prove convergence in probability with an exponential rate of a version of our algorithms and study proposed techniques via extensive simulations. Even with a tiny number of tree queries, the best action found by our approach is much safer than the baseline. Moreover, our approach constantly finds better than the baseline action in terms of objective. This is because we revise the values and statistics maintained in the search tree and remove from them the contribution of the pruned actions.

Anytime Probabilistically Constrained Provably Convergent Online Belief Space Planning

TL;DR

This article presents an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains in terms of states, actions, and observations and general-belief-dependent reward and payoff operators and proves convergence in probability with an exponential rate of a version of the algorithms.

Abstract

Taking into account future risk is essential for an autonomously operating robot to find online not only the best but also a safe action to execute. In this paper, we build upon the recently introduced formulation of probabilistic belief-dependent constraints. We present an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains. Unlike previous approaches, our method assures safety anytime with respect to the currently expanded search tree without relying on the convergence of the search. We prove convergence in probability with an exponential rate of a version of our algorithms and study proposed techniques via extensive simulations. Even with a tiny number of tree queries, the best action found by our approach is much safer than the baseline. Moreover, our approach constantly finds better than the baseline action in terms of objective. This is because we revise the values and statistics maintained in the search tree and remove from them the contribution of the pruned actions.

Paper Structure

This paper contains 44 sections, 8 theorems, 67 equations, 30 figures, 5 tables, 5 algorithms.

Key Result

Lemma 1

The value function under a stochastic execution policy complies to the following form

Figures (30)

  • Figure 1: Here we plot the asymmetric search tree approximating stochastic future policy. For simplicity the action space here is $\mathcal{A}{=}\{a^1, a^2\}$. We behold that many actions emanating from each belief node and each action has weight defined by relevant visitation count as in \ref{['eq:QEstApproxMCTSNORoll']}. Thus, the MCTS approximates stochastic future policy. Note that here the observations and beliefs has global index (superscript) while actions have local index according to the action number in the space $\mathcal{A}$.
  • Figure 2:
  • Figure 3:
  • Figure 5: Illustration of the effect of truncation of motion model T.
  • Figure 6:
  • ...and 25 more figures

Theorems & Definitions (11)

  • Lemma 1: Representation of the Value Function
  • Theorem 1: Necessary condition for entire observation space $\mathcal{Z}$ of children of $h^-_{\ell}$ to be safe
  • Theorem 2: Representation of PC, recursive form
  • Definition 1: Dangerous action
  • Corollary 1
  • Theorem 3
  • Theorem 4: Convergence with Exponential Rate in Probability
  • Lemma 2
  • Lemma 3
  • Definition 2: Regularity Hypothesis
  • ...and 1 more