Table of Contents
Fetching ...

Stochastic Shortest Path Problem with Failure Probability

Ritsusamuel Otsubo

TL;DR

This work extends the stochastic shortest path framework to explicitly handle failure risk by introducing dead-ends and an allowable failure threshold $\epsilon$. It jointly optimizes a policy using a BAMDP perspective and a two-player zero-sum game, resulting in a $J_{c,\gamma,\epsilon}$ objective that blends the cost of successful episodes with a penalty for potential failures. The authors develop finite-approximation schemes (Case S and Case M) that enable practical computation via value iteration on bounded MDPs, with theoretical guarantees as $\gamma\to1$ and $M\to\infty$. They validate the approach on a motion-planning problem with obstacle avoidance, showing that expanding beyond conservative max-prob policies yields faster, less costly routes while keeping the failure probability within the prescribed bound. The framework thus offers a principled, scalable method for risk-aware sequential decision making in uncertain environments.

Abstract

We solve a sequential decision-making problem under uncertainty that takes into account the failure probability of a task. This problem cannot be handled by the stochastic shortest path problem, which is the standard model for sequential decision-making. This problem is addressed by introducing dead-ends. Conventionally, we only consider policies that minimize the probability of task failure, so the optimal policy constructed could be overly conservative. In this paper, we address this issue by expanding the search range to a class of policies whose failure probability is less than a desired threshold. This problem can be solved by treating it as a framework of a Bayesian Markov decision process and a two-person zero-sum game. Also, it can be seen that the optimal policy is expressed in the form of a probability distribution on a set of deterministic policies. We also demonstrate the effectiveness of the proposed methods by applying them to a motion planning problem with obstacle avoidance for a moving robot.

Stochastic Shortest Path Problem with Failure Probability

TL;DR

This work extends the stochastic shortest path framework to explicitly handle failure risk by introducing dead-ends and an allowable failure threshold . It jointly optimizes a policy using a BAMDP perspective and a two-player zero-sum game, resulting in a objective that blends the cost of successful episodes with a penalty for potential failures. The authors develop finite-approximation schemes (Case S and Case M) that enable practical computation via value iteration on bounded MDPs, with theoretical guarantees as and . They validate the approach on a motion-planning problem with obstacle avoidance, showing that expanding beyond conservative max-prob policies yields faster, less costly routes while keeping the failure probability within the prescribed bound. The framework thus offers a principled, scalable method for risk-aware sequential decision making in uncertain environments.

Abstract

We solve a sequential decision-making problem under uncertainty that takes into account the failure probability of a task. This problem cannot be handled by the stochastic shortest path problem, which is the standard model for sequential decision-making. This problem is addressed by introducing dead-ends. Conventionally, we only consider policies that minimize the probability of task failure, so the optimal policy constructed could be overly conservative. In this paper, we address this issue by expanding the search range to a class of policies whose failure probability is less than a desired threshold. This problem can be solved by treating it as a framework of a Bayesian Markov decision process and a two-person zero-sum game. Also, it can be seen that the optimal policy is expressed in the form of a probability distribution on a set of deterministic policies. We also demonstrate the effectiveness of the proposed methods by applying them to a motion planning problem with obstacle avoidance for a moving robot.
Paper Structure (32 sections, 16 theorems, 99 equations, 5 figures, 2 tables)

This paper contains 32 sections, 16 theorems, 99 equations, 5 figures, 2 tables.

Key Result

Theorem 1

Given $\pi \in \Pi$, Here, $h_d(x')$ takes 1 if $x'\neq 0$, otherwise 0.

Figures (5)

  • Figure 1: Motion planning with obstacle avoidance: Deriving a policy to reach the goal quickly while reducing the probability of collision with obstacles A, B and the wall to $\epsilon=0.05$
  • Figure 2: Trajectories for 50 episodes when $\pi_{max}^{(S)}$ is applied: The robot takes a large detour to the right of obstacle A in many episodes
  • Figure 3: Trajectories for 50 episodes when $\pi_{max}^{(M)}$ is applied: The robot takes a large detour to the right of obstacle A in many episodes
  • Figure 4: Trajectories for 50 episodes when $\pi_{mod}^{(S)}$ is applied: Frequency of selecting a trajectory that passes to the left of obstacle A, which has a collision risk but reaches the terminal state in less time compared to the trajectory when policy $\pi_{max}^{(S)}$, increases
  • Figure 5: Trajectories for 50 episodes when $\pi_{mod}^{(M)}$ is applied: Frequency of selecting a trajectory that passes to the left of obstacle A, which has a collision risk but reaches the terminal state in less time compared to the trajectory when policy $\pi_{max}^{(M)}$, increases

Theorems & Definitions (22)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • Theorem 10
  • ...and 12 more