Reinforcement Learning for Solving Stochastic Vehicle Routing Problem with Time Windows
Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
TL;DR
This paper addresses the SVRP with Time Windows by introducing a reinforcement learning framework that accounts for stochastic demands and uncertain travel costs, while leveraging external information and time-window constraints. It develops an attention-based policy trained with REINFORCE (policy gradient) to minimize expected routing costs, including a recourse cost for failure scenarios, and compares against Clarke-Wright, Tabu Search, and Ant Colony Optimization baselines. The study demonstrates that the RL approach achieves a 1.73% travel-cost reduction over the strongest classical baseline and shows robustness across diverse environmental configurations, inference strategies, and problem sizes. It also examines the integration of external variables, inference techniques (greedy, sampling, beam search), and the impact of stochastic components on performance, offering a versatile benchmark for SVRP research and industry applications.
Abstract
This paper introduces a reinforcement learning approach to optimize the Stochastic Vehicle Routing Problem with Time Windows (SVRP), focusing on reducing travel costs in goods delivery. We develop a novel SVRP formulation that accounts for uncertain travel costs and demands, alongside specific customer time windows. An attention-based neural network trained through reinforcement learning is employed to minimize routing costs. Our approach addresses a gap in SVRP research, which traditionally relies on heuristic methods, by leveraging machine learning. The model outperforms the Ant-Colony Optimization algorithm, achieving a 1.73% reduction in travel costs. It uniquely integrates external information, demonstrating robustness in diverse environments, making it a valuable benchmark for future SVRP studies and industry application.
