Table of Contents
Fetching ...

Dynamic Programming: From Local Optimality to Global Optimality

John Stachurski, Jingni Yang, Ziyue Yang

TL;DR

The paper addresses when local optimality at a single state, namely $v_\sigma(x)=v^*(x)$, propagates to global optimality in continuous-state MDPs. It develops sufficient conditions based on irreducibility (strong, open-set, and $\pi$-irreducibility) and reachability/continuity to ensure $v_\sigma=v^*$ everywhere if it holds at one state, enabling global optimality to be inferred from pointwise information. The authors prove a key result that strong irreducibility makes the local-to-global conditions equivalent, and they extend these ideas to weaker topological conditions that still transmit optimality across states; they illustrate the theory with an optimal savings problem solved via gradient ascent over neural-network policies, comparing against a global solver. They also provide reducible MDPs and a two-state example to show the necessity of irreducibility assumptions and discuss extensions to unbounded rewards, state-dependent discounting, and non-MDP dynamic programs, highlighting practical implications for policy-based algorithms in large-scale DP.

Abstract

In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality from a single state imply optimality from every state? Working in a general setting, we provide sufficient conditions for this property that relate to reachability and irreducibility. Our results have significant implications for modern policy-based algorithms used to solve large-scale dynamic programs. We illustrate our findings by applying them to an optimal savings problem via an algorithm that implements gradient ascent in a policy space constructed from neural networks.

Dynamic Programming: From Local Optimality to Global Optimality

TL;DR

The paper addresses when local optimality at a single state, namely , propagates to global optimality in continuous-state MDPs. It develops sufficient conditions based on irreducibility (strong, open-set, and -irreducibility) and reachability/continuity to ensure everywhere if it holds at one state, enabling global optimality to be inferred from pointwise information. The authors prove a key result that strong irreducibility makes the local-to-global conditions equivalent, and they extend these ideas to weaker topological conditions that still transmit optimality across states; they illustrate the theory with an optimal savings problem solved via gradient ascent over neural-network policies, comparing against a global solver. They also provide reducible MDPs and a two-state example to show the necessity of irreducibility assumptions and discuss extensions to unbounded rewards, state-dependent discounting, and non-MDP dynamic programs, highlighting practical implications for policy-based algorithms in large-scale DP.

Abstract

In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality from a single state imply optimality from every state? Working in a general setting, we provide sufficient conditions for this property that relate to reachability and irreducibility. Our results have significant implications for modern policy-based algorithms used to solve large-scale dynamic programs. We illustrate our findings by applying them to an optimal savings problem via an algorithm that implements gradient ascent in a policy space constructed from neural networks.

Paper Structure

This paper contains 20 sections, 19 theorems, 40 equations, 6 figures, 1 algorithm.

Key Result

Lemma 2.1

Every point evaluation functional on $b\mathsf X$ is a nonzero element of $b\mathsf X_+'$.

Figures (6)

  • Figure 1: $v_{\hat{\sigma}}$ and $\hat{\sigma}$ with $\bar{w} = 1$ vs OPI solutions
  • Figure 3: The upper bound law of motion for wealth
  • Figure 4: $\hat{v}_\sigma$ and $\hat{\sigma}$ with $\bar{w} = 1$ vs OPI solutions
  • Figure 6: $\hat{v}_\sigma$ and $\hat{\sigma}$ with $\bar{w} = 50$ against the OPI solutions.
  • Figure 8: $\hat{v}_\sigma$ and $\hat{\sigma}$ with $\bar{w} = 50$ vs OPI solutions
  • ...and 1 more figures

Theorems & Definitions (37)

  • Lemma 2.1
  • proof
  • Proposition 2.2
  • proof
  • Lemma 2.3
  • proof
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • ...and 27 more