Table of Contents
Fetching ...

Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L. Maskell, You Zhou

TL;DR

This survey addresses the need for an up-to-date, comprehensive view of neural combinatorial optimization solvers for VRPs, including variants such as $TSP$, $CVRP$, and beyond. It introduces a four-category taxonomy—Learning to Construct (L2C), Learning to Improve (L2I), Learning to Predict-Once (L2P-O), and Learning to Predict-Multiplicity (L2P-M)—and systematically analyzes encoder/decoder designs, MDP formulations, and data augmentation/post-processing strategies across these classes. The authors identify four key inadequacies (generalization, large-scale solving, VRP variants, and fair comparisons) and summarize on-going efforts and promising directions to address them, including D&C, diffusion-based methods, region-attention, and multi-task learning. A live repository accompanies the survey to track emerging solvers, aiming to foster progress and broader adoption by both the OR community and ML researchers in VRPs.

Abstract

Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted, they did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to establish a comprehensive and up-to-date taxonomy of NCO solvers, we systematically review relevant publications and preprints, categorizing them into four distinct types, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we discuss on-going efforts, identify open inadequacies, as well as propose promising and viable directions to overcome these inadequacies. Notably, existing efforts focus on only one or two of these inadequacies, with none attempting to address all of them concurrently. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across VRPs of varying scales. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we aim to foster further advancements in the NCO community.

Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

TL;DR

This survey addresses the need for an up-to-date, comprehensive view of neural combinatorial optimization solvers for VRPs, including variants such as , , and beyond. It introduces a four-category taxonomy—Learning to Construct (L2C), Learning to Improve (L2I), Learning to Predict-Once (L2P-O), and Learning to Predict-Multiplicity (L2P-M)—and systematically analyzes encoder/decoder designs, MDP formulations, and data augmentation/post-processing strategies across these classes. The authors identify four key inadequacies (generalization, large-scale solving, VRP variants, and fair comparisons) and summarize on-going efforts and promising directions to address them, including D&C, diffusion-based methods, region-attention, and multi-task learning. A live repository accompanies the survey to track emerging solvers, aiming to foster progress and broader adoption by both the OR community and ML researchers in VRPs.

Abstract

Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted, they did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to establish a comprehensive and up-to-date taxonomy of NCO solvers, we systematically review relevant publications and preprints, categorizing them into four distinct types, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we discuss on-going efforts, identify open inadequacies, as well as propose promising and viable directions to overcome these inadequacies. Notably, existing efforts focus on only one or two of these inadequacies, with none attempting to address all of them concurrently. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across VRPs of varying scales. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we aim to foster further advancements in the NCO community.
Paper Structure (38 sections, 15 equations, 7 figures, 6 tables)

This paper contains 38 sections, 15 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration of the number of publications and preprints on NCO solvers for VRPs. This information is gathered from Google Scholar and Web of Science with the keywords "Neural Combinatorial Optimization" OR "NCO" OR "Reinforcement Learning" OR "Deep Learning" OR "Neural Network" AND "Vehicle Routing Problem" OR "VRP" OR "Traveling Salesman Problem" OR "TSP" by the end of 2023. Following the initial data collection, a meticulous examination of each literature piece is conducted to precisely define its scope within the realm of NCO.
  • Figure 2: Illustration of interactions between the agent and the environment in RL, where $t$, $a_t$, $s_t$, $\pi$, and $r_t$ denote the time, action, state, policy, and reward, respectively. The policy $\pi(\cdot|s)$ of the agent takes the state $s_t$ as input and selects the action $a_t$ as output. Following this, the action is executed in the environment, and the agent receives a reward $r_t$. This sequential process continues as the state transitions to $s_{t+1}$.
  • Figure 3: Illustration of the generic construction process of L2C solvers, starting from an empty solution set and ending with complete solutions. Most L2C solvers are composed of an encoder and a decoder. The encoder is used to output the embeddings of VRP instances, while the decoder selects nodes based on these embeddings to construct complete solutions.
  • Figure 4: Illustration of AM kool_attention_2019.
  • Figure 5: Illustration of the iterative improvement solutions process of L2I solvers, starting from an initial complete solution and ending within a given timeframe. L2I solvers first rely on regional strategies to select regions (typically node pairs). Subsequently, diverse strategies are employed to repair the sub-tours of the selected regions.
  • ...and 2 more figures