Geometric Re-Analysis of Classical MDP Solving Algorithms

Arsenii Mustafin; Aleksei Pakharev; Alex Olshevsky; Ioannis Ch. Paschalidis

Geometric Re-Analysis of Classical MDP Solving Algorithms

Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis

TL;DR

This work studies the convergence of Value Iteration (VI) and Policy Iteration (PI) for finite MDPs through a geometry-based interpretation, introducing a discount-factor transformation that preserves dynamics and yields an effective discount $\gamma_{\rm eff}$. It reveals a rotation component in VI and proves that, when the optimal-policy induced MRP is irreducible and aperiodic, VI converges at a rate strictly faster than the standard bound $γ$, with bounds involving the mixing rate $τ$. A 2-state MDP analysis shows PI converges in at most the number of actions, and the paper derives improved VI iteration counts in terms of $τ^{1/N}$, along with simplified geometric proofs. Overall, the paper provides a new analytical framework for VI and PI, offering practical convergence improvements and guidance for geometry-informed algorithm design in MDPs.

Abstract

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $γ$, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than $γ$.

Geometric Re-Analysis of Classical MDP Solving Algorithms

TL;DR

. It reveals a rotation component in VI and proves that, when the optimal-policy induced MRP is irreducible and aperiodic, VI converges at a rate strictly faster than the standard bound

, with bounds involving the mixing rate

. A 2-state MDP analysis shows PI converges in at most the number of actions, and the paper derives improved VI iteration counts in terms of

, along with simplified geometric proofs. Overall, the paper provides a new analytical framework for VI and PI, offering practical convergence improvements and guidance for geometry-informed algorithm design in MDPs.

Abstract

, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than

Geometric Re-Analysis of Classical MDP Solving Algorithms

TL;DR

Abstract

Geometric Re-Analysis of Classical MDP Solving Algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)