On reachability of Markov decision processes: a novel state-classification-based PI approach
Yanyun Li, Xin Guo, Xianping Guo
TL;DR
The paper addresses minimizing the probability of reaching a failure set in finite-state, finite-action MDPs without using reward or cost criteria. It introduces the concept of $B^c$-absorbing sets and the largest absorbing set $F_*$, then derives an improved optimality equation on $G_*=B^c\setminus F_*$ with a unique solution. A two-stage, state-classification-based policy iteration is developed (Algorithms 1–2): Algorithm 1 computes $F_*$ and the restricted policy class, while Algorithm 2 performs finite-time PI on $\Pi_d^s(F_*)$ using a single $|G_*|$-variable evaluation per iteration. The approach offers computational advantages over prior average-MDP PI methods and is demonstrated on a reliability/maintenance example, yielding explicit policy prescriptions and closed-form minimal reaching probabilities. Together, these contributions provide a practical, finite-time solution to reachability optimization in finite MDPs with direct reliability implications.
Abstract
This paper concentrates on the reliability of a discrete-time controlled Markov system with finite states and actions, and aims to give an efficient algorithm for obtaining an optimal (control) policy that makes the system have the maximal reliability for every initial state. After establishing the existence of an optimal policy, for the computation of optimal policies, we introduce the concept of an absorbing set of a stationary policy, and find some characterization and a computational method of the absorbing sets. Using the largest absorbing set, we build a novel optimality equation (OE), and prove the uniqueness of a solution of the OE. Furthermore, we provide a policy iteration algorithm of optimal policies, and prove that an optimal policy and the maximal reliability can be obtained in a finite number of iterations. Finally, an example in reliability and maintenance problems is given to illustrate our results.
