Geometric Re-Analysis of Classical MDP Solving Algorithms
Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis
TL;DR
This work studies the convergence of Value Iteration (VI) and Policy Iteration (PI) for finite MDPs through a geometry-based interpretation, introducing a discount-factor transformation that preserves dynamics and yields an effective discount $\gamma_{\rm eff}$. It reveals a rotation component in VI and proves that, when the optimal-policy induced MRP is irreducible and aperiodic, VI converges at a rate strictly faster than the standard bound $γ$, with bounds involving the mixing rate $τ$. A 2-state MDP analysis shows PI converges in at most the number of actions, and the paper derives improved VI iteration counts in terms of $τ^{1/N}$, along with simplified geometric proofs. Overall, the paper provides a new analytical framework for VI and PI, offering practical convergence improvements and guidance for geometry-informed algorithm design in MDPs.
Abstract
We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $γ$, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than $γ$.
