Table of Contents
Fetching ...

Efficient Constraint Generation for Stochastic Shortest Path Problems

Johannes Schmalz, Felipe Trevizan

TL;DR

This work identifies a bottleneck in SSP solvers: backing up all actions in each state wastes computation on suboptimal choices. By recasting heuristic search as linear programming with constraint generation, the authors introduce CG-iLAO$^*$, which uses a tight separation oracle to add only violated action-constraints, effectively ignoring inactive actions. The approach yields substantial practical gains, achieving up to eightfold speedups over LRTDP and iLAO$^*$, and reduces the number of actions and Q-value evaluations needed. The results demonstrate a strong synthesis of planning and OR techniques, with robustness across multiple SSP domains and heuristics and clear guidance on when and how to leverage this method in practice.

Abstract

Current methods for solving Stochastic Shortest Path Problems (SSPs) find states' costs-to-go by applying Bellman backups, where state-of-the-art methods employ heuristics to select states to back up and prune. A fundamental limitation of these algorithms is their need to compute the cost-to-go for every applicable action during each state backup, leading to unnecessary computation for actions identified as sub-optimal. We present new connections between planning and operations research and, using this framework, we address this issue of unnecessary computation by introducing an efficient version of constraint generation for SSPs. This technique allows algorithms to ignore sub-optimal actions and avoid computing their costs-to-go. We also apply our novel technique to iLAO* resulting in a new algorithm, CG-iLAO*. Our experiments show that CG-iLAO* ignores up to 57% of iLAO*'s actions and it solves problems up to 8x and 3x faster than LRTDP and iLAO*.

Efficient Constraint Generation for Stochastic Shortest Path Problems

TL;DR

This work identifies a bottleneck in SSP solvers: backing up all actions in each state wastes computation on suboptimal choices. By recasting heuristic search as linear programming with constraint generation, the authors introduce CG-iLAO, which uses a tight separation oracle to add only violated action-constraints, effectively ignoring inactive actions. The approach yields substantial practical gains, achieving up to eightfold speedups over LRTDP and iLAO, and reduces the number of actions and Q-value evaluations needed. The results demonstrate a strong synthesis of planning and OR techniques, with robustness across multiple SSP domains and heuristics and clear guidance on when and how to leverage this method in practice.

Abstract

Current methods for solving Stochastic Shortest Path Problems (SSPs) find states' costs-to-go by applying Bellman backups, where state-of-the-art methods employ heuristics to select states to back up and prune. A fundamental limitation of these algorithms is their need to compute the cost-to-go for every applicable action during each state backup, leading to unnecessary computation for actions identified as sub-optimal. We present new connections between planning and operations research and, using this framework, we address this issue of unnecessary computation by introducing an efficient version of constraint generation for SSPs. This technique allows algorithms to ignore sub-optimal actions and avoid computing their costs-to-go. We also apply our novel technique to iLAO* resulting in a new algorithm, CG-iLAO*. Our experiments show that CG-iLAO* ignores up to 57% of iLAO*'s actions and it solves problems up to 8x and 3x faster than LRTDP and iLAO*.
Paper Structure (18 sections, 3 theorems, 1 equation, 6 figures, 8 tables, 2 algorithms)

This paper contains 18 sections, 3 theorems, 1 equation, 6 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

CG-iLAO$^*$ terminates.

Figures (6)

  • Figure 1: An SSP where CG-iLAO$^*$'s value function is not monotonically non-decreasing.
  • Figure 2: Triangle Tire World problems 1 (left) and 2 (right).
  • Figure 3: For each algorithm and heuristic, the cumulative plot of how many instances were solved w.r.t. time in seconds (left) and number of $Q$-values (right). Both plots start at 200 solved instances and (right) is cut off at $4 \times 10^8$$Q$-values.
  • Figure 4: Cumulative plot of state density, i.e., ${|\widehat{\mathsf{A}\xspace}\xspace(s\xspace)|}/{|\mathsf{A}\xspace(s\xspace)|}$ over 50 instances for the largest solved problem per domain.
  • Figure 5: Search time (excludes compute time for the heuristic) of algorithm with $h^{\text{pert}}_w$ as $w$ varies. We show mean and 95% C.I. of all considered problems over 50 instances.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: $\epsilon\text{-consistency}$
  • Definition 2: Partial SSP
  • Definition 3: Inactive Action
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof