On reachability of Markov decision processes: a novel state-classification-based PI approach

Yanyun Li; Xin Guo; Xianping Guo

On reachability of Markov decision processes: a novel state-classification-based PI approach

Yanyun Li, Xin Guo, Xianping Guo

TL;DR

The paper addresses minimizing the probability of reaching a failure set in finite-state, finite-action MDPs without using reward or cost criteria. It introduces the concept of $B^c$-absorbing sets and the largest absorbing set $F_*$, then derives an improved optimality equation on $G_*=B^c\setminus F_*$ with a unique solution. A two-stage, state-classification-based policy iteration is developed (Algorithms 1–2): Algorithm 1 computes $F_*$ and the restricted policy class, while Algorithm 2 performs finite-time PI on $\Pi_d^s(F_*)$ using a single $|G_*|$-variable evaluation per iteration. The approach offers computational advantages over prior average-MDP PI methods and is demonstrated on a reliability/maintenance example, yielding explicit policy prescriptions and closed-form minimal reaching probabilities. Together, these contributions provide a practical, finite-time solution to reachability optimization in finite MDPs with direct reliability implications.

Abstract

This paper concentrates on the reliability of a discrete-time controlled Markov system with finite states and actions, and aims to give an efficient algorithm for obtaining an optimal (control) policy that makes the system have the maximal reliability for every initial state. After establishing the existence of an optimal policy, for the computation of optimal policies, we introduce the concept of an absorbing set of a stationary policy, and find some characterization and a computational method of the absorbing sets. Using the largest absorbing set, we build a novel optimality equation (OE), and prove the uniqueness of a solution of the OE. Furthermore, we provide a policy iteration algorithm of optimal policies, and prove that an optimal policy and the maximal reliability can be obtained in a finite number of iterations. Finally, an example in reliability and maintenance problems is given to illustrate our results.

On reachability of Markov decision processes: a novel state-classification-based PI approach

TL;DR

The paper addresses minimizing the probability of reaching a failure set in finite-state, finite-action MDPs without using reward or cost criteria. It introduces the concept of

-absorbing sets and the largest absorbing set

, then derives an improved optimality equation on

with a unique solution. A two-stage, state-classification-based policy iteration is developed (Algorithms 1–2): Algorithm 1 computes

and the restricted policy class, while Algorithm 2 performs finite-time PI on

using a single

-variable evaluation per iteration. The approach offers computational advantages over prior average-MDP PI methods and is demonstrated on a reliability/maintenance example, yielding explicit policy prescriptions and closed-form minimal reaching probabilities. Together, these contributions provide a practical, finite-time solution to reachability optimization in finite MDPs with direct reliability implications.

Abstract

Paper Structure (7 sections, 12 theorems, 57 equations)

This paper contains 7 sections, 12 theorems, 57 equations.

Introduction
Problem Statement
Preliminaries
On $B^c$-absorbing sets of stationary policies
A state-classification-based PI algorithm of optimal policies
Application to maintenance problems
Conclusion

Key Result

Lemma 3.1

Theorems & Definitions (20)

Definition 2.1
Definition 2.2
Lemma 3.1
Proposition 3.1
Remark 3.1
Example 3.1
Definition 4.1
Lemma 4.1
Lemma 4.2
Theorem 4.1
...and 10 more

On reachability of Markov decision processes: a novel state-classification-based PI approach

TL;DR

Abstract

On reachability of Markov decision processes: a novel state-classification-based PI approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (20)