GPU-based Split algorithm for Large-Scale CVRPSD
Jingyi Zhao, Linxin Yang, Haohua Zhang, Tian Ding
TL;DR
This work tackles the scalability bottleneck of scenario-based stochastic optimization for routing problems by reformulating forward dynamic programming recursions as batched min-plus operations on layered DAGs and executing them on GPUs. By precomputing masks to enforce capacity constraints and leveraging two-dimensional parallelism across scenarios and transitions, the approach enables evaluating over $10^6$ demand realizations within practical runtimes, significantly reducing estimation bias in SAA and improving first-stage decisions. The method is instantiated for CVRPSD and dynamic stochastic inventory routing, achieving up to $\times65$ speedups over CPU baselines and near-linear scalability with the number of scenarios, as well as enabling tighter decision-making under fixed time budgets. Overall, the work provides a general, GPU-enabled recipe to transform classical DP routines into high-throughput primitives, broadening the computational frontier for stochastic discrete optimization in logistics and related domains.
Abstract
Dynamic programming (DP) is a cornerstone of combinatorial optimization, yet its inherently sequential structure has long limited its scalability in scenario-based stochastic programming (SP). This paper introduces a GPU-accelerated framework that reformulates a broad class of forward DP recursions as batched min-plus matrix-vector products over layered DAGs, collapsing actions into masked state-to-state transitions that map seamlessly to GPU kernels. Using this reformulation, our approach takes advantage of massive parallelism across both scenarios and transitions, enabling the simultaneous evaluation of \emph{over one million uncertainty realizations} in a single GPU pass -- a scale far beyond the reach of existing methods. We instantiate the framework in two canonical applications: the capacitated vehicle routing problem with stochastic demand and a dynamic stochastic inventory routing problem. In both cases, DP subroutines traditionally considered sequential are redesigned to harness two- or three-dimensional GPU parallelism. Experiments demonstrate near-linear scaling in the number of scenarios and yield one to three orders of magnitude speedups over multithreaded CPU baselines, resulting in tighter SAA estimates and significantly stronger first-stage decisions under fixed time budgets. Beyond these applications, our work establishes a general-purpose recipe for transforming classical DP routines into high-throughput GPU primitives, substantially expanding the computational frontier of stochastic discrete optimization to the million-scenario scale.
