Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX
Joseph Farrington, Kezhi Li, Wai Keong Wong, Martin Utley
TL;DR
This work addresses the computational intractability of optimal replenishment policies for perishable inventory with complex age-structured state spaces. It introduces a JAX-based, GPU-accelerated value-iteration framework (with vmap/pmap) and gymnax-based simulators to solve large Markov decision processes, extending tractability to problems with millions of states and features like lead-time, substitution, and uncertain useful life. The authors demonstrate feasibility across three scenarios, achieving near-optimal performance with heuristic policies (optimality gaps typically below 2.5%) and achieving substantial wall-time reductions compared to CPU baselines, including up to up to 16M-state problems converging in hours on consumer GPUs and minutes on multi-GPU clusters. The work provides open-source code and Colab resources, highlighting practical impact for operational research and inventory management by enabling rigorous benchmarking and faster experimentation on accessible hardware. The broader significance lies in enabling large-scale, realistic policy evaluation and benchmark development, potentially accelerating advances in perishable logistics and related stochastic optimization problems.
Abstract
Value iteration can find the optimal replenishment policy for a perishable inventory problem, but is computationally demanding due to the large state spaces that are required to represent the age profile of stock. The parallel processing capabilities of modern GPUs can reduce the wall time required to run value iteration by updating many states simultaneously. The adoption of GPU-accelerated approaches has been limited in operational research relative to other fields like machine learning, in which new software frameworks have made GPU programming widely accessible. We used the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level API, and relied on this library's function transformations and compiler to efficiently utilize GPU hardware. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We demonstrate this on example scenarios from three recent studies which include problems with over 16 million states and additional problem features, such as substitution between products, that increase computational complexity. We compare the performance of the optimal replenishment policies to heuristic policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters on thousands of simulated years. The heuristic policies gave a maximum optimality gap of 2.49%. Our general approach may be applicable to a wide range of problems in operational research that would benefit from large-scale parallel computation on consumer-grade GPU hardware.
