Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization

Chanyeong Kim; Jongwoong Park; Hyunglip Bae; Woo Chang Kim

Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization

Chanyeong Kim, Jongwoong Park, Hyunglip Bae, Woo Chang Kim

TL;DR

This work tackles the computational bottleneck of large-scale multistage stochastic programming by introducing TranSDDP, a Transformer-based method that sequentially generates subgradient cuts to approximate the convex, piecewise-linear value function $Q_t(x_{t-1})$. By learning cuts from a parametric family of stochastic elements and reusing past information, TranSDDP achieves significant speedups over traditional SDDP while preserving solution quality, and a decoder-only variant offers reduced complexity. Empirical results across energy, financial, and production planning demonstrate favorable evaluation times and robust feasibility, with performance competitive with or superior to existing approaches like VFGL and ν-SDDP in various settings. The approach highlights the potential of sequence models to broadly accelerate MSP and suggests avenues for transfer learning to further scale to diverse problem families.

Abstract

Solving large-scale multistage stochastic programming (MSP) problems poses a significant challenge as commonly used stagewise decomposition algorithms, including stochastic dual dynamic programming (SDDP), face growing time complexity as the subproblem size and problem count increase. Traditional approaches approximate the value functions as piecewise linear convex functions by incrementally accumulating subgradient cutting planes from the primal and dual solutions of stagewise subproblems. Recognizing these limitations, we introduce TranSDDP, a novel Transformer-based stagewise decomposition algorithm. This innovative approach leverages the structural advantages of the Transformer model, implementing a sequential method for integrating subgradient cutting planes to approximate the value function. Through our numerical experiments, we affirm TranSDDP's effectiveness in addressing MSP problems. It efficiently generates a piecewise linear approximation for the value function, significantly reducing computation time while preserving solution quality, thus marking a promising progression in the treatment of large-scale multistage stochastic programming problems.

Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization

TL;DR

. By learning cuts from a parametric family of stochastic elements and reusing past information, TranSDDP achieves significant speedups over traditional SDDP while preserving solution quality, and a decoder-only variant offers reduced complexity. Empirical results across energy, financial, and production planning demonstrate favorable evaluation times and robust feasibility, with performance competitive with or superior to existing approaches like VFGL and ν-SDDP in various settings. The approach highlights the potential of sequence models to broadly accelerate MSP and suggests avenues for transfer learning to further scale to diverse problem families.

Abstract

Paper Structure (40 sections, 23 equations, 16 figures, 7 tables, 1 algorithm)

This paper contains 40 sections, 23 equations, 16 figures, 7 tables, 1 algorithm.

Introduction
Preliminary
Problem Setting
Stochastic Dual Dynamic Programming
Improving the efficiency of SDDP
Parametric Value Function Approximation
Selection of subgradient cutting planes
Generation of subgradient cutting planes
Transformer
Model
Input and Output Sequence
Model Architecture
Dataset
Learning System
Experiments
...and 25 more sections

Figures (16)

Figure 1: Time elapsed per problem
Figure 2: Infeasibility ratio per epoch
Figure 3: Architecture of TranSDDP
Figure 4: Architecture of TranSDDP-Decoder
Figure 5: Design of training system
...and 11 more figures

Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization

TL;DR

Abstract

Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (16)