Dual dynamic programming for stochastic programs over an infinite horizon
Caleb Ju, Guanghui Lan
TL;DR
The paper tackles infinite-horizon, stationary multi-stage stochastic programs with discounted costs by developing CE-Inf-EDDP, a continually-exploring variant of explorative dual dynamic programming that shares the favorable complexity of state-of-the-art methods while improving practical performance. It introduces a basic Inf-EDDP framework with forward/backward phases, a stage-wise SAA, and a saturation-based trial-point strategy that propagates cutting-plane updates across stages, leading to non-asymptotic convergence guarantees. To further reduce dependence on the effective horizon $T$, it proposes Case 1 and Case 2 variants (collectively CE-Inf-EDDP), including Case 1 with a theoretical bound $K=4T(D/\epsilon+1)^n$, and extends the approach to hierarchical stationary problems via 2-stage stochastic approximation (2SSA) inside a CE-Inf-HDDP framework with provable sample and iteration complexities. The authors support their methods with extensive numerical experiments on infinite-horizon newsvendor, risk-averse variants, hydrothermal planning, and a newsvendor with secondary assembly, demonstrating tight dual bounds, favorable runtimes, and effective policy generalization compared with finite-horizon methods and randomized SDDP variants. The work contributes non-asymptotic analysis, horizon-robust exploration, and a scalable hierarchy-capable DDP toolkit, with open-source code to promote reproducibility and application in energy and supply-chain domains.
Abstract
We consider solving stochastic programs over an infinite horizon. By leveraging the stationarity of problem, we develop a novel continually-exploring infinite-horizon explorative dual dynamic programming (CE-Inf-EDDP) algorithm that matches state-of-the-art complexity while providing encouraging numerical performance on the newsvendor and hydrothermal planning problem. CE-Inf-EDDP conceptually differs from previous dual dynamic programming approaches by exploring the feasible region longer and updating the cutting-plane model more frequently. In addition, our algorithm can handle both simple linear to more complex nonlinear costs. To demonstrate this, we extend our algorithm to handle the so-called hierarchical stationary stochastic program, where the cost function is a parametric multi-stage stochastic program. The hierarchical program can model problems with a hierarchy of decision-making, e.g., how long-term decisions influence day-to-day operations. As a concrete example, we introduce a newsvendor problem that includes a second-stage multi-product assembly serving as a secondary market.
