Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients
Johannes Müller, Semih Çaycı, Guido Montúfar
TL;DR
This paper analyzes Fisher-Rao gradient flows on the state-action distributions of linear programs, revealing linear convergence with a rate governed by the geometry of the feasible region and providing sharper bounds on the entropic regularization error. It then extends the framework to natural gradient flows in parameterized measure spaces, establishing sublinear convergence under inexact gradients and distribution mismatch, as well as conditions for global linear convergence in multi-player games. The results directly apply to state-action natural policy gradients in MDPs, yielding sublinear convergence under general parametrizations and linear convergence for regular tabular policies, while revealing an implicit maximal-entropy bias when optimizers are not unique. The theoretical findings are complemented by computational examples that illustrate convergence behavior and demonstrate the practical relevance for RL algorithms leveraging entropy-regularized objectives and FR-based natural gradients.
Abstract
Kakade's natural policy gradient method has been studied extensively in recent years, showing linear convergence with and without regularization. We study another natural gradient method based on the Fisher information matrix of the state-action distributions which has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.
