Weakly Time-Coupled Approximation of Markov Decision Processes

Negar Soheili; Selvaprabu Nadarajah; Bo Yang

Weakly Time-Coupled Approximation of Markov Decision Processes

Negar Soheili, Selvaprabu Nadarajah, Bo Yang

Abstract

Finite-horizon Markov decision processes (MDPs) with high-dimensional exogenous uncertainty and endogenous states arise in operations and finance, including the valuation and exercise of Bermudan and real options, but face a scalability barrier as computational complexity grows with the horizon. A common approximation represents the value function using basis functions, but methods for fitting weights treat cross-stage optimization differently. Least squares Monte Carlo (LSM) fits weights via backward recursion and regression, avoiding joint optimization but accumulating error over the horizon. Approximate linear programming (ALP) and pathwise optimization (PO) jointly fit weights to produce upper bounds, but temporal coupling causes computational complexity to grow with the horizon. We show this coupling is an artifact of the approximation architecture, and develop a weakly time-coupled approximation (WTCA) where cross-stage dependence is independent of horizon. For any fixed basis function set, the WTCA upper bound is tighter than that of ALP and looser than that of PO, and converges to the optimal policy value as the basis family expands. We extend parallel deterministic block coordinate descent to the stochastic MDP setting exploiting weak temporal coupling. Applied to WTCA, weak coupling yields computational complexity independent of the horizon. Within equal time budget, solving WTCA accommodates more exogenous samples or basis functions than PO, yielding tighter bounds despite PO being tighter for fixed samples and basis functions. On Bermudan option and ethanol production instances, WTCA produces tighter upper bounds than PO and LSM in every instance tested, with near-optimal policies at longer horizons.

Weakly Time-Coupled Approximation of Markov Decision Processes

Abstract

Paper Structure (61 sections, 12 theorems, 166 equations, 2 figures, 6 tables, 2 algorithms)

This paper contains 61 sections, 12 theorems, 166 equations, 2 figures, 6 tables, 2 algorithms.

Introduction
Finite-Horizon Markov Decision Processes
Weakening Temporal Coupling in Optimization-Based Approximations
A Unified Stochastic Optimization Framework
Fully Time-Coupled Models: Pathwise Optimization and Approximate Linear Programming
Pathwise optimization.
Approximate linear programming.
A Weakly Time-Coupled Approximation
Computational Tradeoffs and Model Selection
Sample paths.
Basis functions.
Exploiting Weak Time Coupling via Parallel Computation
Smoothing and Stochastic Gradient Descent
A Parallel Stochastic Block Coordinate Descent
Model Selection Under Computational Time Budgets
...and 46 more sections

Key Result

Proposition 1

Let $\mathcal{D}$ denote the set of sequences $\alpha = (\alpha_0,\ldots,\alpha_{T-1})$, where each $\alpha_t$ is a probability measure on $\mathcal{W}_t$. Suppose $\Omega$ is chosen large enough that the optimal solution of eq:ALP-def lies in its interior and $\phi_{t,1} \equiv 1$ for all $t \in \m are equal, and their sets of optimal solutions in $\beta$ coincide.

Figures (2)

Figure 1: Convergence of upper and lower bounds for WTCA (left) and PO (right) in the instance with $\mathbf{T=36}$, $\mathbf{N=8}$, and $\mathbf{w^I=100}$.
Figure EC.1: Endogenous state transitions in ethanol production guthrie2009realyang2024leastyang2025improved.

Theorems & Definitions (23)

Definition 1: Weak and Full Temporal Coupling
Proposition 1
Proposition 2
Proposition 3
Theorem 1
Corollary 1
Proposition 4
proof
proof
Lemma EC.1: Theorem 5.12 in beck2017
...and 13 more

Weakly Time-Coupled Approximation of Markov Decision Processes

Abstract

Weakly Time-Coupled Approximation of Markov Decision Processes

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (23)