Table of Contents
Fetching ...

GPU-based Split algorithm for Large-Scale CVRPSD

Jingyi Zhao, Linxin Yang, Haohua Zhang, Tian Ding

TL;DR

This work tackles the scalability bottleneck of scenario-based stochastic optimization for routing problems by reformulating forward dynamic programming recursions as batched min-plus operations on layered DAGs and executing them on GPUs. By precomputing masks to enforce capacity constraints and leveraging two-dimensional parallelism across scenarios and transitions, the approach enables evaluating over $10^6$ demand realizations within practical runtimes, significantly reducing estimation bias in SAA and improving first-stage decisions. The method is instantiated for CVRPSD and dynamic stochastic inventory routing, achieving up to $\times65$ speedups over CPU baselines and near-linear scalability with the number of scenarios, as well as enabling tighter decision-making under fixed time budgets. Overall, the work provides a general, GPU-enabled recipe to transform classical DP routines into high-throughput primitives, broadening the computational frontier for stochastic discrete optimization in logistics and related domains.

Abstract

Dynamic programming (DP) is a cornerstone of combinatorial optimization, yet its inherently sequential structure has long limited its scalability in scenario-based stochastic programming (SP). This paper introduces a GPU-accelerated framework that reformulates a broad class of forward DP recursions as batched min-plus matrix-vector products over layered DAGs, collapsing actions into masked state-to-state transitions that map seamlessly to GPU kernels. Using this reformulation, our approach takes advantage of massive parallelism across both scenarios and transitions, enabling the simultaneous evaluation of \emph{over one million uncertainty realizations} in a single GPU pass -- a scale far beyond the reach of existing methods. We instantiate the framework in two canonical applications: the capacitated vehicle routing problem with stochastic demand and a dynamic stochastic inventory routing problem. In both cases, DP subroutines traditionally considered sequential are redesigned to harness two- or three-dimensional GPU parallelism. Experiments demonstrate near-linear scaling in the number of scenarios and yield one to three orders of magnitude speedups over multithreaded CPU baselines, resulting in tighter SAA estimates and significantly stronger first-stage decisions under fixed time budgets. Beyond these applications, our work establishes a general-purpose recipe for transforming classical DP routines into high-throughput GPU primitives, substantially expanding the computational frontier of stochastic discrete optimization to the million-scenario scale.

GPU-based Split algorithm for Large-Scale CVRPSD

TL;DR

This work tackles the scalability bottleneck of scenario-based stochastic optimization for routing problems by reformulating forward dynamic programming recursions as batched min-plus operations on layered DAGs and executing them on GPUs. By precomputing masks to enforce capacity constraints and leveraging two-dimensional parallelism across scenarios and transitions, the approach enables evaluating over demand realizations within practical runtimes, significantly reducing estimation bias in SAA and improving first-stage decisions. The method is instantiated for CVRPSD and dynamic stochastic inventory routing, achieving up to speedups over CPU baselines and near-linear scalability with the number of scenarios, as well as enabling tighter decision-making under fixed time budgets. Overall, the work provides a general, GPU-enabled recipe to transform classical DP routines into high-throughput primitives, broadening the computational frontier for stochastic discrete optimization in logistics and related domains.

Abstract

Dynamic programming (DP) is a cornerstone of combinatorial optimization, yet its inherently sequential structure has long limited its scalability in scenario-based stochastic programming (SP). This paper introduces a GPU-accelerated framework that reformulates a broad class of forward DP recursions as batched min-plus matrix-vector products over layered DAGs, collapsing actions into masked state-to-state transitions that map seamlessly to GPU kernels. Using this reformulation, our approach takes advantage of massive parallelism across both scenarios and transitions, enabling the simultaneous evaluation of \emph{over one million uncertainty realizations} in a single GPU pass -- a scale far beyond the reach of existing methods. We instantiate the framework in two canonical applications: the capacitated vehicle routing problem with stochastic demand and a dynamic stochastic inventory routing problem. In both cases, DP subroutines traditionally considered sequential are redesigned to harness two- or three-dimensional GPU parallelism. Experiments demonstrate near-linear scaling in the number of scenarios and yield one to three orders of magnitude speedups over multithreaded CPU baselines, resulting in tighter SAA estimates and significantly stronger first-stage decisions under fixed time budgets. Beyond these applications, our work establishes a general-purpose recipe for transforming classical DP routines into high-throughput GPU primitives, substantially expanding the computational frontier of stochastic discrete optimization to the million-scenario scale.

Paper Structure

This paper contains 17 sections, 1 theorem, 5 equations, 5 figures.

Key Result

Proposition 1

Let $\{ \tilde{\xi}^1, \dots, \tilde{\xi}^m \}$ be i.i.d. samples of $\tilde{\xi}$. Denote by the true and sample average problems, respectively, with optimal solutions $x^*$ and $x_m^*$. Then:

Figures (5)

  • Figure 1: Example of splitting a giant tour into feasible routes under a given demand scenario.
  • Figure 2: Example of Vectorized Split Algorithm.
  • Figure 3: Runtime comparison for processing a splits evaluation under $10^4$–$10^6$ demand scenarios. GPU parallelization scales almost linearly with the number of scenarios, while CPU runtimes grow super-linearly and quickly become prohibitive.
  • Figure 4: Out-of-sample performance of first-stage solutions obtained with varying observed scenario settings. Larger evaluation set yield more robust and lower-cost solutions.
  • Figure 5: Quality of the best solution obtained at each time under a fixed time budget. GPU consistently achieves better decisions due to faster evaluation and thus larger effective search effort.

Theorems & Definitions (2)

  • Example 1
  • Proposition 1