Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives
Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L. Maskell, You Zhou
TL;DR
This survey addresses the need for an up-to-date, comprehensive view of neural combinatorial optimization solvers for VRPs, including variants such as $TSP$, $CVRP$, and beyond. It introduces a four-category taxonomy—Learning to Construct (L2C), Learning to Improve (L2I), Learning to Predict-Once (L2P-O), and Learning to Predict-Multiplicity (L2P-M)—and systematically analyzes encoder/decoder designs, MDP formulations, and data augmentation/post-processing strategies across these classes. The authors identify four key inadequacies (generalization, large-scale solving, VRP variants, and fair comparisons) and summarize on-going efforts and promising directions to address them, including D&C, diffusion-based methods, region-attention, and multi-task learning. A live repository accompanies the survey to track emerging solvers, aiming to foster progress and broader adoption by both the OR community and ML researchers in VRPs.
Abstract
Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted, they did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to establish a comprehensive and up-to-date taxonomy of NCO solvers, we systematically review relevant publications and preprints, categorizing them into four distinct types, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we discuss on-going efforts, identify open inadequacies, as well as propose promising and viable directions to overcome these inadequacies. Notably, existing efforts focus on only one or two of these inadequacies, with none attempting to address all of them concurrently. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across VRPs of varying scales. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we aim to foster further advancements in the NCO community.
