Table of Contents
Fetching ...

Dual dynamic programming for stochastic programs over an infinite horizon

Caleb Ju, Guanghui Lan

TL;DR

The paper tackles infinite-horizon, stationary multi-stage stochastic programs with discounted costs by developing CE-Inf-EDDP, a continually-exploring variant of explorative dual dynamic programming that shares the favorable complexity of state-of-the-art methods while improving practical performance. It introduces a basic Inf-EDDP framework with forward/backward phases, a stage-wise SAA, and a saturation-based trial-point strategy that propagates cutting-plane updates across stages, leading to non-asymptotic convergence guarantees. To further reduce dependence on the effective horizon $T$, it proposes Case 1 and Case 2 variants (collectively CE-Inf-EDDP), including Case 1 with a theoretical bound $K=4T(D/\epsilon+1)^n$, and extends the approach to hierarchical stationary problems via 2-stage stochastic approximation (2SSA) inside a CE-Inf-HDDP framework with provable sample and iteration complexities. The authors support their methods with extensive numerical experiments on infinite-horizon newsvendor, risk-averse variants, hydrothermal planning, and a newsvendor with secondary assembly, demonstrating tight dual bounds, favorable runtimes, and effective policy generalization compared with finite-horizon methods and randomized SDDP variants. The work contributes non-asymptotic analysis, horizon-robust exploration, and a scalable hierarchy-capable DDP toolkit, with open-source code to promote reproducibility and application in energy and supply-chain domains.

Abstract

We consider solving stochastic programs over an infinite horizon. By leveraging the stationarity of problem, we develop a novel continually-exploring infinite-horizon explorative dual dynamic programming (CE-Inf-EDDP) algorithm that matches state-of-the-art complexity while providing encouraging numerical performance on the newsvendor and hydrothermal planning problem. CE-Inf-EDDP conceptually differs from previous dual dynamic programming approaches by exploring the feasible region longer and updating the cutting-plane model more frequently. In addition, our algorithm can handle both simple linear to more complex nonlinear costs. To demonstrate this, we extend our algorithm to handle the so-called hierarchical stationary stochastic program, where the cost function is a parametric multi-stage stochastic program. The hierarchical program can model problems with a hierarchy of decision-making, e.g., how long-term decisions influence day-to-day operations. As a concrete example, we introduce a newsvendor problem that includes a second-stage multi-product assembly serving as a secondary market.

Dual dynamic programming for stochastic programs over an infinite horizon

TL;DR

The paper tackles infinite-horizon, stationary multi-stage stochastic programs with discounted costs by developing CE-Inf-EDDP, a continually-exploring variant of explorative dual dynamic programming that shares the favorable complexity of state-of-the-art methods while improving practical performance. It introduces a basic Inf-EDDP framework with forward/backward phases, a stage-wise SAA, and a saturation-based trial-point strategy that propagates cutting-plane updates across stages, leading to non-asymptotic convergence guarantees. To further reduce dependence on the effective horizon , it proposes Case 1 and Case 2 variants (collectively CE-Inf-EDDP), including Case 1 with a theoretical bound , and extends the approach to hierarchical stationary problems via 2-stage stochastic approximation (2SSA) inside a CE-Inf-HDDP framework with provable sample and iteration complexities. The authors support their methods with extensive numerical experiments on infinite-horizon newsvendor, risk-averse variants, hydrothermal planning, and a newsvendor with secondary assembly, demonstrating tight dual bounds, favorable runtimes, and effective policy generalization compared with finite-horizon methods and randomized SDDP variants. The work contributes non-asymptotic analysis, horizon-robust exploration, and a scalable hierarchy-capable DDP toolkit, with open-source code to promote reproducibility and application in energy and supply-chain domains.

Abstract

We consider solving stochastic programs over an infinite horizon. By leveraging the stationarity of problem, we develop a novel continually-exploring infinite-horizon explorative dual dynamic programming (CE-Inf-EDDP) algorithm that matches state-of-the-art complexity while providing encouraging numerical performance on the newsvendor and hydrothermal planning problem. CE-Inf-EDDP conceptually differs from previous dual dynamic programming approaches by exploring the feasible region longer and updating the cutting-plane model more frequently. In addition, our algorithm can handle both simple linear to more complex nonlinear costs. To demonstrate this, we extend our algorithm to handle the so-called hierarchical stationary stochastic program, where the cost function is a parametric multi-stage stochastic program. The hierarchical program can model problems with a hierarchy of decision-making, e.g., how long-term decisions influence day-to-day operations. As a concrete example, we introduce a newsvendor problem that includes a second-stage multi-product assembly serving as a secondary market.
Paper Structure (19 sections, 23 theorems, 33 equations, 3 figures, 3 tables, 5 algorithms)

This paper contains 19 sections, 23 theorems, 33 equations, 3 figures, 3 tables, 5 algorithms.

Key Result

Corollary 2.1

For any fixed $c \in \xi \in \Theta$ (e.g., $\tilde{c}_i$) and any $\bar{\epsilon} \in (0,+\infty)$, the cost function $h(\cdot, c)$ is Lipschitz continuous over $\mathcal{X}(\bar{\epsilon})$, i.e., there exists an $M_h <+\infty$ s.t. Also, $h$ is bounded over $\mathcal{X}(\bar{\epsilon})$, and in particular,

Figures (3)

  • Figure 1: Optimality gap convergence for different DDP-methods. Similar to previous experiments, EDDP and SDDP share the same gaps.
  • Figure 2: Gaps for solving infinite-horizon hydrothermal problem with $\lambda=0.8$ (left) and $\lambda=0.9906$ (right). CE-Inf-SDDP runs are repeated over 10 seeds, and the 10% and 90% quantile are shown via the shaded region and the median performance in the solid/dashed line.
  • Figure :

Theorems & Definitions (42)

  • Corollary 2.1
  • proof
  • Lemma 2.2
  • proof
  • Definition 2.1
  • Definition 2.2
  • Lemma 2.3
  • proof
  • Lemma 2.4
  • proof
  • ...and 32 more