Stochastic Shortest Path Problem with Failure Probability

Ritsusamuel Otsubo

Stochastic Shortest Path Problem with Failure Probability

Ritsusamuel Otsubo

TL;DR

This work extends the stochastic shortest path framework to explicitly handle failure risk by introducing dead-ends and an allowable failure threshold $\epsilon$. It jointly optimizes a policy using a BAMDP perspective and a two-player zero-sum game, resulting in a $J_{c,\gamma,\epsilon}$ objective that blends the cost of successful episodes with a penalty for potential failures. The authors develop finite-approximation schemes (Case S and Case M) that enable practical computation via value iteration on bounded MDPs, with theoretical guarantees as $\gamma\to1$ and $M\to\infty$. They validate the approach on a motion-planning problem with obstacle avoidance, showing that expanding beyond conservative max-prob policies yields faster, less costly routes while keeping the failure probability within the prescribed bound. The framework thus offers a principled, scalable method for risk-aware sequential decision making in uncertain environments.

Abstract

We solve a sequential decision-making problem under uncertainty that takes into account the failure probability of a task. This problem cannot be handled by the stochastic shortest path problem, which is the standard model for sequential decision-making. This problem is addressed by introducing dead-ends. Conventionally, we only consider policies that minimize the probability of task failure, so the optimal policy constructed could be overly conservative. In this paper, we address this issue by expanding the search range to a class of policies whose failure probability is less than a desired threshold. This problem can be solved by treating it as a framework of a Bayesian Markov decision process and a two-person zero-sum game. Also, it can be seen that the optimal policy is expressed in the form of a probability distribution on a set of deterministic policies. We also demonstrate the effectiveness of the proposed methods by applying them to a motion planning problem with obstacle avoidance for a moving robot.

Stochastic Shortest Path Problem with Failure Probability

TL;DR

This work extends the stochastic shortest path framework to explicitly handle failure risk by introducing dead-ends and an allowable failure threshold

. It jointly optimizes a policy using a BAMDP perspective and a two-player zero-sum game, resulting in a

objective that blends the cost of successful episodes with a penalty for potential failures. The authors develop finite-approximation schemes (Case S and Case M) that enable practical computation via value iteration on bounded MDPs, with theoretical guarantees as

and

. They validate the approach on a motion-planning problem with obstacle avoidance, showing that expanding beyond conservative max-prob policies yields faster, less costly routes while keeping the failure probability within the prescribed bound. The framework thus offers a principled, scalable method for risk-aware sequential decision making in uncertain environments.

Abstract

Paper Structure (32 sections, 16 theorems, 99 equations, 5 figures, 2 tables)

This paper contains 32 sections, 16 theorems, 99 equations, 5 figures, 2 tables.

Introduction
Constrained SSP Problem with Failure Probability
Model
Policy
Dead-Ends
Objective Functions
Constraint
Solution to Approximate Problems
MDPs for Objective Function and Constraint
Definitions of $\mathcal{M}^{(S)}_{o}$ and $\mathcal{M}^{(S)}_{d,\gamma}$
Definitions of $\mathcal{M}^{(M)}_{o}$ and $\mathcal{M}^{(M)}_{d,\gamma}$
Two-Person Zero-Sum Game Parameterized by $c$, $\epsilon$ and $\gamma$
Derivation of $J^*_{c,\gamma,\epsilon}$
Derivation of Optimal Policy
Approximate Solution
...and 17 more sections

Key Result

Theorem 1

Given $\pi \in \Pi$, Here, $h_d(x')$ takes 1 if $x'\neq 0$, otherwise 0.

Figures (5)

Figure 1: Motion planning with obstacle avoidance: Deriving a policy to reach the goal quickly while reducing the probability of collision with obstacles A, B and the wall to $\epsilon=0.05$
Figure 2: Trajectories for 50 episodes when $\pi_{max}^{(S)}$ is applied: The robot takes a large detour to the right of obstacle A in many episodes
Figure 3: Trajectories for 50 episodes when $\pi_{max}^{(M)}$ is applied: The robot takes a large detour to the right of obstacle A in many episodes
Figure 4: Trajectories for 50 episodes when $\pi_{mod}^{(S)}$ is applied: Frequency of selecting a trajectory that passes to the left of obstacle A, which has a collision risk but reaches the terminal state in less time compared to the trajectory when policy $\pi_{max}^{(S)}$, increases
Figure 5: Trajectories for 50 episodes when $\pi_{mod}^{(M)}$ is applied: Frequency of selecting a trajectory that passes to the left of obstacle A, which has a collision risk but reaches the terminal state in less time compared to the trajectory when policy $\pi_{max}^{(M)}$, increases

Theorems & Definitions (22)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Theorem 7
Theorem 8
Theorem 9
Theorem 10
...and 12 more

Stochastic Shortest Path Problem with Failure Probability

TL;DR

Abstract

Stochastic Shortest Path Problem with Failure Probability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (22)