Table of Contents
Fetching ...

Efficient Constraint Generation for Stochastic Shortest Path Problems

Johannes Schmalz, Felipe Trevizan

Abstract

Stochastic Shortest Path problems (SSPs) are traditionally solved by computing each state's cost-to-go by applying Bellman backups. A Bellman backup updates a state's cost-to-go by iterating through every applicable action, computing the cost-to-go after applying each one, and selecting a minimal action's cost-to-go. State-of-the-art algorithms use heuristic functions; these give an initial estimate of costs-to-go, and lets the algorithm apply Bellman backups only to promising states, determined by low estimated costs-to-go. However, each Bellman backup still considers all applicable actions, even if the heuristic tells us that some of these actions are too expensive, with the effect that such algorithms waste time on unhelpful actions. To address this gap we present a technique that uses the heuristic to avoid expensive actions, by reframing heuristic search in terms of linear programming and introducing an efficient implementation of constraint generation for SSPs. We present CG-iLAO*, a new algorithm that adapts iLAO* with our novel technique, and considers only 40% of iLAO*'s actions on many problems, and as few as 1% on some. Consequently, CG-iLAO* computes on average 3.5x fewer costs-to-go for actions than the state-of-the-art iLAO* and LRTDP, enabling it to solve problems faster an average of 2.8x and 3.7x faster, respectively.

Efficient Constraint Generation for Stochastic Shortest Path Problems

Abstract

Stochastic Shortest Path problems (SSPs) are traditionally solved by computing each state's cost-to-go by applying Bellman backups. A Bellman backup updates a state's cost-to-go by iterating through every applicable action, computing the cost-to-go after applying each one, and selecting a minimal action's cost-to-go. State-of-the-art algorithms use heuristic functions; these give an initial estimate of costs-to-go, and lets the algorithm apply Bellman backups only to promising states, determined by low estimated costs-to-go. However, each Bellman backup still considers all applicable actions, even if the heuristic tells us that some of these actions are too expensive, with the effect that such algorithms waste time on unhelpful actions. To address this gap we present a technique that uses the heuristic to avoid expensive actions, by reframing heuristic search in terms of linear programming and introducing an efficient implementation of constraint generation for SSPs. We present CG-iLAO*, a new algorithm that adapts iLAO* with our novel technique, and considers only 40% of iLAO*'s actions on many problems, and as few as 1% on some. Consequently, CG-iLAO* computes on average 3.5x fewer costs-to-go for actions than the state-of-the-art iLAO* and LRTDP, enabling it to solve problems faster an average of 2.8x and 3.7x faster, respectively.

Paper Structure

This paper contains 29 sections, 12 theorems, 17 equations, 16 figures, 9 tables, 3 algorithms.

Key Result

Theorem 1

Consider globally $\epsilon\text{-consistent}$$V$. Then, $\blacktriangleleft$$\blacktriangleleft$

Figures (16)

  • Figure 1: A gridworld probabilistic navigation problem and the states and actions that $\text{iLAO}^*$ considers on this problem after 2 iterations. This example was taken from Hansen2001:ilao.
  • Figure 2: SSPs that violate \ref{['assump:infinite-improper']}, and consequently have optimal value functions that induce suboptimal policies. One SSP violates it with a zero-cost cycle, and the other SSP violates it by having a state with no applicable actions.
  • Figure 3: An SSP with a value function that is $\epsilon\text{-consistent}$ but inadmissible, and its greedy policy is suboptimal.
  • Figure 4: An SSP where \ref{['lp:vi']}'s solution encodes an optimal policy, but does not give the optimal value function at $s\xspace_1$.
  • Figure 5: Example SSP that we solve with $\text{iLAO}^*$ under the lens of constraint and variable generation. $H\xspace(s\xspace)$ are given in the bottom of each node.
  • ...and 11 more figures

Theorems & Definitions (36)

  • Definition 1: Stochastic Shortest Path problem (SSP) Bertsekas1991:SSPs
  • Definition 2: Closed and Open Policies
  • Definition 3: Proper and Improper Policies
  • Definition 4: Bellman Backup
  • Definition 5: Global $\epsilon\text{-consistency}$
  • Definition 6
  • Theorem 1: VI Error Mausam2012:MDPs
  • Definition 7: $\epsilon\text{-consistency}$ Bonet2003:hdp
  • Definition 8: Admissible Value Function
  • Definition 9: Monotonic Value Function
  • ...and 26 more