Dynamic Programming: From Local Optimality to Global Optimality
John Stachurski, Jingni Yang, Ziyue Yang
TL;DR
The paper addresses when local optimality at a single state, namely $v_\sigma(x)=v^*(x)$, propagates to global optimality in continuous-state MDPs. It develops sufficient conditions based on irreducibility (strong, open-set, and $\pi$-irreducibility) and reachability/continuity to ensure $v_\sigma=v^*$ everywhere if it holds at one state, enabling global optimality to be inferred from pointwise information. The authors prove a key result that strong irreducibility makes the local-to-global conditions equivalent, and they extend these ideas to weaker topological conditions that still transmit optimality across states; they illustrate the theory with an optimal savings problem solved via gradient ascent over neural-network policies, comparing against a global solver. They also provide reducible MDPs and a two-state example to show the necessity of irreducibility assumptions and discuss extensions to unbounded rewards, state-dependent discounting, and non-MDP dynamic programs, highlighting practical implications for policy-based algorithms in large-scale DP.
Abstract
In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality from a single state imply optimality from every state? Working in a general setting, we provide sufficient conditions for this property that relate to reachability and irreducibility. Our results have significant implications for modern policy-based algorithms used to solve large-scale dynamic programs. We illustrate our findings by applying them to an optimal savings problem via an algorithm that implements gradient ascent in a policy space constructed from neural networks.
