Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
Arsenii Mustafin, Xinyi Sheng, Dominik Baumann
TL;DR
The paper addresses the longstanding split between discounted and average-reward MDP analyses by proposing a geometry-inspired, unified framework that extends the existing mdp_geometry to the average-reward setting with γ = 1. It introduces new action and policy vectors to maintain a coherent geometric interpretation for both reward criteria and demonstrates that Value Iteration converges geometrically under a unique unichain optimal policy. The key contributions include the reformulation of VI in the average-reward context, the invertibility and normalization results for unichain MDPs, and a rigorous contraction bound in the span seminorm. This unification enriches the theoretical toolkit for MDP convergence analysis and informs practical methods for analyzing and designing algorithms across both reward criteria.
Abstract
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.
