Table of Contents
Fetching ...

Taming Infinity one Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

Michal Ajdarów, James C. A. Main, Petr Novotný, Mickael Randour

TL;DR

This work investigates infinite-state OC-MDPs by introducing interval-based strategies (OEIS and CIS) that yield compact, verifiable representations. It develops a compression framework that transforms potentially infinite Markov chains into finite (or periodically finite) compressed chains, preserving termination and counter-hitting properties and enabling decision procedures within PSPACE. The authors provide comprehensive complexity landscapes: verification for OEIS/CIS in bounded and open-ended settings lies in P^{PosSLP} or co-ETR, while realisability results span NP^{PosSLP}, NP^{ETR}, and PSPACE across various variants; they also establish square-root-sum hardness and NP-hardness for several problems through careful reductions. The results deliver decidability and practical representation benefits for a natural, expressive subclass of strategies in OC-MDPs, with significant implications for synthesis under infinite-state constraints and connections to number-theoretic problems via hardness reductions.

Abstract

Markov decision processes (MDPs) are a canonical model to reason about decision making within a stochastic environment. We study a fundamental class of infinite MDPs: one-counter MDPs (OC-MDPs). They extend finite MDPs via an associated counter taking natural values, thus inducing an infinite MDP over the set of configurations (current state and counter value). We consider two characteristic objectives: reaching a target state (state-reachability), and reaching a target state with counter value zero (selective termination). The synthesis problem for the latter is not known to be decidable and connected to major open problems in number theory. Furthermore, even seemingly simple strategies (e.g., memoryless ones) in OC-MDPs might be impossible to build in practice (due to the underlying infinite configuration space): we need finite, and preferably small, representations. To overcome these obstacles, we introduce two natural classes of concisely represented strategies based on a (possibly infinite) partition of counter values in intervals. For both classes, and both objectives, we study the verification problem (does a given strategy ensure a high enough probability for the objective?), and two synthesis problems (does there exist such a strategy?): one where the interval partition is fixed as input, and one where it is only parameterized. We develop a generic approach based on a compression of the induced infinite MDP that yields decidability in all cases, with all complexities within PSPACE.

Taming Infinity one Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

TL;DR

This work investigates infinite-state OC-MDPs by introducing interval-based strategies (OEIS and CIS) that yield compact, verifiable representations. It develops a compression framework that transforms potentially infinite Markov chains into finite (or periodically finite) compressed chains, preserving termination and counter-hitting properties and enabling decision procedures within PSPACE. The authors provide comprehensive complexity landscapes: verification for OEIS/CIS in bounded and open-ended settings lies in P^{PosSLP} or co-ETR, while realisability results span NP^{PosSLP}, NP^{ETR}, and PSPACE across various variants; they also establish square-root-sum hardness and NP-hardness for several problems through careful reductions. The results deliver decidability and practical representation benefits for a natural, expressive subclass of strategies in OC-MDPs, with significant implications for synthesis under infinite-state constraints and connections to number-theoretic problems via hardness reductions.

Abstract

Markov decision processes (MDPs) are a canonical model to reason about decision making within a stochastic environment. We study a fundamental class of infinite MDPs: one-counter MDPs (OC-MDPs). They extend finite MDPs via an associated counter taking natural values, thus inducing an infinite MDP over the set of configurations (current state and counter value). We consider two characteristic objectives: reaching a target state (state-reachability), and reaching a target state with counter value zero (selective termination). The synthesis problem for the latter is not known to be decidable and connected to major open problems in number theory. Furthermore, even seemingly simple strategies (e.g., memoryless ones) in OC-MDPs might be impossible to build in practice (due to the underlying infinite configuration space): we need finite, and preferably small, representations. To overcome these obstacles, we introduce two natural classes of concisely represented strategies based on a (possibly infinite) partition of counter values in intervals. For both classes, and both objectives, we study the verification problem (does a given strategy ensure a high enough probability for the objective?), and two synthesis problems (does there exist such a strategy?): one where the interval partition is fixed as input, and one where it is only parameterized. We develop a generic approach based on a compression of the induced infinite MDP that yields decidability in all cases, with all complexities within PSPACE.

Paper Structure

This paper contains 29 sections, 39 theorems, 12 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Lemma 10

Let $I=\llbracket{}b^-, b^+\rrbracket{}$ be a bounded interval of $\mathbb{N}_{>0}$. The interval partition $\mathsf{Refine}(I)$ of $I$ obtained via Algorithm algorithm:refine:intervals satisfies $|\mathsf{Refine}(I)|\leq\log_2(|I|+1)+1\leq\log_2(b^+)+1$ and, for all $J\in\mathsf{Refine}(I)$, we hav

Figures (6)

  • Figure 1: An OC-MDP. Edge splits following actions indicate probabilistic transitions. We indicate the weight of a state-action pair next to each action.
  • Figure 2: An OC-MDP where all weights are $-1$ and are omitted from the figure.
  • Figure 3: An illustration of an OC-MDP and its compression for a specific strategy.
  • Figure 4: An illustration of the counter update scheme in a compressed Markov chain for the interval $\llbracket{}1, 7\rrbracket{}$.
  • Figure 5: Fragments of Markov chain used to derive a characterisation for transition probabilities of $\mathcal{C}^{{\sigma_{}}}_{\mathcal{I}}$ for a bounded interval of the form $\llbracket{}1, 2^\beta-1\rrbracket{}$.
  • ...and 1 more figures

Theorems & Definitions (48)

  • Remark 1
  • Remark 2
  • Remark 3
  • Example 4
  • Remark 5
  • Example 6
  • Example 7
  • Remark 9
  • Lemma 10
  • Theorem 11
  • ...and 38 more