Positivity-hardness results on Markov decision processes

Jakob Piribauer; Christel Baier

Positivity-hardness results on Markov decision processes

Jakob Piribauer, Christel Baier

TL;DR

This work studies optimization problems on finite-state Markov decision processes that combine non-determinism with probabilistic transitions, focusing on one-counter and integer-weighted variants. The authors introduce a modular gadget framework to encode linear recurrence sequences within the optimal values of various objectives, then reduce the Positivity problem to threshold questions for these objectives (e.g., termination probabilities, energy-satisfaction, SSPPs, and CVaR). The main contribution is a suite of Positivity-hardness results, showing that solving these threshold problems would imply major breakthroughs in analytic number theory; in particular, decidability of Positivity would yield far-reaching consequences. The reductions rely on carefully constructed MDP gadgets that encode both the recurrence and its initial values, forcing schedulers to make decisions that reflect the sign of the recurrence terms. Together, these results delineate a deep connection between fundamental number-theoretic problems and the algorithmic tractability of a broad class of optimization problems in MDPs, suggesting that many of these problems are likely undecidable or require extraordinary advances to resolve.

Abstract

This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction probabilities of constraints on the total accumulated weight, the computation of quantiles for the accumulated weight, and the conditional value-at-risk for accumulated weights for integer-weighted MDPs. Although algorithmic results are available for some special instances, the decidability status of the decision versions of these problems is unknown in general. The paper demonstrates that these optimization problems are inherently mathematically difficult by providing polynomial-time reductions from the Positivity problem for linear recurrence sequences. This problem is a well-known number-theoretic problem whose decidability status has been open for decades and it is known that decidability of the Positivity problem would have far-reaching consequences in analytic number theory. So, the reductions presented in the paper show that an algorithmic solution to any of the investigated problems is not possible without a major breakthrough in analytic number theory. The reductions rely on the construction of MDP-gadgets that encode the initial values and linear recurrence relations of linear recurrence sequences. These gadgets can flexibly be adjusted to prove the various Positivity-hardness results.

Positivity-hardness results on Markov decision processes

TL;DR

Abstract

Paper Structure (39 sections, 17 theorems, 40 equations, 9 figures)

This paper contains 39 sections, 17 theorems, 40 equations, 9 figures.

Introduction
Positivity problem
Problems under investigation and related work on these problems
Energy objectives, one-counter MDPs, and quantiles.
Non-classical stochastic shortest path problems (SSPPs).
Contribution
Main result.
Related work on Skolem- and Positivity-hardness in verification
Outline
Preliminaries
Markov decision process.
Scheduler.
Probability measure.
Classical stochastic shortest path problem.
Outline of the Positivity-hardness proofs
...and 24 more sections

Key Result

Theorem 4.1

The Positivity problem is reducible in polynomial time to the following problems: Given an MDP $\mathcal{M}$ and a rational $\vartheta\in(0,1)$,

Figures (9)

Figure 4: Overview of the dependencies between the Positivity-hardness results. The squares refer to the threshold problems for the respective quantities.
Figure 5: Interplay between the MDP-gadgets.
Figure 6: The initial gadget $\mathcal{I}$.
Figure 7: The gadget $\mathsf{G}_{\bar{\alpha}}$ to encode linear recurrence relations. The example here is depicted for a linear recurrence of depth $2$ with $\alpha_1\geq 0$ and $\alpha_2< 0$. The outgoing actions $\gamma_i$ and $\delta_i$ lead to the gadget encoding initial values as depicted in Figure \ref{['fig:structure_gadgets']}.
Figure 8: Gadget $\mathcal{O}_{\bar{\beta}}$ encoding initial values of a linear recurrence sequence in terms of maximal termination probabilities of one-counter MDPs.
...and 4 more figures

Theorems & Definitions (19)

definition 1.1: Positivity problem
Theorem 4.1
corollary 4.2
Lemma 4.3
Lemma 4.4
corollary 4.5
remark 4.6
corollary 4.7
corollary 4.8
corollary 4.9
...and 9 more

Positivity-hardness results on Markov decision processes

TL;DR

Abstract

Positivity-hardness results on Markov decision processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (19)