From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
Jingyi Zhao, Linxin Yang, Haohua Zhang, Tian Ding
TL;DR
The paper tackles the intractable second-stage dynamic programs in scenario-based stochastic programming by introducing a GPU-based, scenario-batched dynamic-programming framework. It reframes DP recursions as batched min-plus computations and implements hardware-aware kernels that expose parallelism across scenarios, DP layers, and action choices, enabling Bellman updates over more than $10^6$ realizations in a single pass. Two instantiations demonstrate the approach: a giant-tour split DP for capacitated vehicle routing with stochastic demand (CVRPSD) and a forward inventory reinsertion DP for dynamic stochastic IRP, each leveraging 2D/3D GPU parallelism to achieve near-linear scaling and large speedups over CPU baselines. Experimental results show that larger scenario sets reduce estimation bias and improve decision quality, and that GPU acceleration translates directly into better first-stage decisions within fixed time budgets. This work establishes a practical path to large-scale, realistic stochastic discrete optimization by delivering full-fidelity second-stage evaluation at scales previously considered infeasible.
Abstract
A major bottleneck in scenario-based Sample Average Approximation (SAA) for stochastic programming (SP) is the cost of solving an exact second-stage problem for every scenario, especially when each scenario contains an NP-hard combinatorial structure. This has led much of the SP literature to restrict the second stage to linear or simplified models. We develop a GPU-based framework that makes full-fidelity integer second-stage models tractable at scale. The key innovation is a set of hardware-aware, scenario-batched GPU kernels that expose parallelism across scenarios, dynamic-programming (DP) layers, and route or action options, enabling Bellman updates to be executed in a single pass over more than 1,000,000 realizations. We evaluate the approach in two representative SP settings: a vectorized split operator for stochastic vehicle routing and a DP for inventory reinsertion. Implementation scales nearly linearly in the number of scenarios and achieves a one-two to four-five orders of magnitude speedup, allowing far larger scenario sets and reliably stronger first-stage decisions. The computational leverage directly improves decision quality: much larger scenario sets and many more first-stage candidates can be evaluated within fixed time budgets, consistently yielding stronger SAA solutions. Our results show that full-fidelity integer second-stage models are tractable at scales previously considered impossible, providing a practical path to large-scale, realistic stochastic discrete optimization.
