Table of Contents
Fetching ...

Widest Path Games and Maximality Inheritance in Bounded Value Iteration for Stochastic Games

Kittiphon Phalakarn, Yun Chen Tsai, Ichiro Hasuo

TL;DR

This paper identifies and formalizes the core principles underlying the widest path-based BVI approach by presenting 2WP-BVI, a clean BVI algorithm based on (2-player) widest path games, and proving its correctness using what is called the maximality inheritance principle.

Abstract

For model checking stochastic games (SGs), bounded value iteration (BVI) algorithms have gained attention as efficient approximate methods with rigorous precision guarantees. However, BVI may not terminate or converge when the target SG contains end components. Most existing approaches address this issue by explicitly detecting and processing end components--a process that is often computationally expensive. An exception is the widest path-based BVI approach previously studied by Phalakarn et al., which we refer to as 1WP-BVI. The method performs particularly well in the presence of numerous end components. Nonetheless, its theoretical foundations remain somewhat ad hoc. In this paper, we identify and formalize the core principles underlying the widest path-based BVI approach by (i) presenting 2WP-BVI, a clean BVI algorithm based on (2-player) widest path games, and (ii) proving its correctness using what we call the maximality inheritance principle--a proof principle previously employed in a well-known result in probabilistic model checking. Our experimental results demonstrate the practical relevance and potential of our proposed 2WP-BVI algorithm.

Widest Path Games and Maximality Inheritance in Bounded Value Iteration for Stochastic Games

TL;DR

This paper identifies and formalizes the core principles underlying the widest path-based BVI approach by presenting 2WP-BVI, a clean BVI algorithm based on (2-player) widest path games, and proving its correctness using what is called the maximality inheritance principle.

Abstract

For model checking stochastic games (SGs), bounded value iteration (BVI) algorithms have gained attention as efficient approximate methods with rigorous precision guarantees. However, BVI may not terminate or converge when the target SG contains end components. Most existing approaches address this issue by explicitly detecting and processing end components--a process that is often computationally expensive. An exception is the widest path-based BVI approach previously studied by Phalakarn et al., which we refer to as 1WP-BVI. The method performs particularly well in the presence of numerous end components. Nonetheless, its theoretical foundations remain somewhat ad hoc. In this paper, we identify and formalize the core principles underlying the widest path-based BVI approach by (i) presenting 2WP-BVI, a clean BVI algorithm based on (2-player) widest path games, and (ii) proving its correctness using what we call the maximality inheritance principle--a proof principle previously employed in a well-known result in probabilistic model checking. Our experimental results demonstrate the practical relevance and potential of our proposed 2WP-BVI algorithm.

Paper Structure

This paper contains 16 sections, 10 theorems, 10 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Theorem 2.6

Assume the setting of def:CCseq. The bottom-up Cousot--Cousot sequence stabilizes, i.e., there is an ordinal $\alpha_0$ such that ${\mathrm{\Gamma}}^{\alpha_0}\bot = {\mathrm{\Gamma}}^{\alpha_0+1}\bot = \cdots$. Its limit ${\mathrm{\Gamma}}^{\alpha_0}\bot$ is the least fixed point (lfp) $\mu{\mathrm

Figures (2)

  • Figure 1: Left: an example of SG. The labels $1$ for $\delta(s,a,s') = 1$ are omitted. Right: the WPG constructed from the SG and $u : s_i \mapsto \text{$i$/5}$ for $0 \leq i \leq 5$. Each $(s,a)$ is labeled by its width $\phi_u(s,a)$. Players' optimal strategies for the widest path objective are in bold.
  • Figure 2: Left: an infinite-state SG giving $\mu{\mathrm{\Phi}} \prec \nu{\mathrm{\Phi}}$ and $\mu{\mathrm{\Omega}} \prec \nu{\mathrm{\Omega}}$. Right: the benchmark ECchain ($N$ is the model parameter). The labels $1$ for $\delta(s,a,s') = 1$ are omitted.

Theorems & Definitions (35)

  • Definition 2.2: stochastic game ${\mathcal{G}}$
  • Definition 2.3: (reachability) value function $V$
  • Example 2.4
  • Definition 2.5: Cousot--Cousot sequence
  • Theorem 2.6: Cousot--Cousot DBLP:conf/popl/CousotC77
  • Theorem 2.7: (special case of) Kleene DBLP:journals/dm/Baranga91
  • Definition 2.9: state-action expectation $\phi_f(s,a)$
  • Definition 2.10: (reachability) Bellman operator ${\mathrm{\Phi}}$
  • Theorem 2.11
  • Example 2.12
  • ...and 25 more