Table of Contents
Fetching ...

Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning

Yuanyao Chen, Rongsheng Chen, Fu Luo, Zhenkun Wang

TL;DR

This work tackles the zero-shot generalization gap of neural combinatorial optimization for large-scale VRPs by introducing TTPL, an LLM-driven test-time projection learning framework. TTPL learns projection strategies during inference to align training and testing distributions, enabling a backbone model trained on 100-node instances to solve up to 100K-node TSP and CVRP instances, without retraining. Enhancing robustness, the authors add Multi-View Decision Fusion (MVDF), which enforces transformation-invariant features by aggregating multiple subgraph perspectives. Extensive experiments on synthetic and real-world benchmarks demonstrate substantial gains over strong baselines, with ablations validating the projection and MVDF components, and versatility studies showing applicability across models and distributions. The work advances practical large-scale VRP solving by reducing reliance on manual redesign and costly retraining, though it notes slower convergence during LLM-driven optimization as an area for future work.

Abstract

Neural Combinatorial Optimization (NCO) has emerged as a promising learning-based paradigm for addressing Vehicle Routing Problems (VRPs) by minimizing the need for extensive manual engineering. While existing NCO methods, trained on small-scale instances (e.g., 100 nodes), have demonstrated considerable success on problems of similar scale, their performance significantly degrades when applied to large-scale scenarios. This degradation arises from the distributional shift between training and testing data, rendering policies learned on small instances ineffective for larger problems. To overcome this limitation, we introduce a novel learning framework driven by Large Language Models (LLMs). This framework learns a projection between the training and testing distributions, which is then deployed to enhance the scalability of the NCO model. Notably, unlike prevailing techniques that necessitate joint training with the neural network, our approach operates exclusively during the inference phase, obviating the need for model retraining. Extensive experiments demonstrate that our method enables a backbone model (trained on 100-node instances) to achieve superior performance on large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) of up to 100K nodes from diverse distributions.

Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning

TL;DR

This work tackles the zero-shot generalization gap of neural combinatorial optimization for large-scale VRPs by introducing TTPL, an LLM-driven test-time projection learning framework. TTPL learns projection strategies during inference to align training and testing distributions, enabling a backbone model trained on 100-node instances to solve up to 100K-node TSP and CVRP instances, without retraining. Enhancing robustness, the authors add Multi-View Decision Fusion (MVDF), which enforces transformation-invariant features by aggregating multiple subgraph perspectives. Extensive experiments on synthetic and real-world benchmarks demonstrate substantial gains over strong baselines, with ablations validating the projection and MVDF components, and versatility studies showing applicability across models and distributions. The work advances practical large-scale VRP solving by reducing reliance on manual redesign and costly retraining, though it notes slower convergence during LLM-driven optimization as an area for future work.

Abstract

Neural Combinatorial Optimization (NCO) has emerged as a promising learning-based paradigm for addressing Vehicle Routing Problems (VRPs) by minimizing the need for extensive manual engineering. While existing NCO methods, trained on small-scale instances (e.g., 100 nodes), have demonstrated considerable success on problems of similar scale, their performance significantly degrades when applied to large-scale scenarios. This degradation arises from the distributional shift between training and testing data, rendering policies learned on small instances ineffective for larger problems. To overcome this limitation, we introduce a novel learning framework driven by Large Language Models (LLMs). This framework learns a projection between the training and testing distributions, which is then deployed to enhance the scalability of the NCO model. Notably, unlike prevailing techniques that necessitate joint training with the neural network, our approach operates exclusively during the inference phase, obviating the need for model retraining. Extensive experiments demonstrate that our method enables a backbone model (trained on 100-node instances) to achieve superior performance on large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) of up to 100K nodes from diverse distributions.

Paper Structure

This paper contains 54 sections, 21 equations, 7 figures, 15 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) Solution construction process for solving TSP5K instance, gray node, blue node, red node, and grey line denote the unvisited nodes, KNN ($k$=100) of the current node, current node, and constructed partial route, respectively. (b) Extracted KNN graph, RWD wangefficient indicates the distance between the input graph and a uniformly distributed training instance. (c) Projected KNN graph. (d) Training instance with 100 nodes.
  • Figure 2: The pipeline of the TTPL framework. It comprises four components: initialization, fitness evaluation, offspring generation, and population update. (a) Initialization: TTPL establishes the initial population by generating individuals through prompting LLM with a predefined template. Following the initialization, an iterative optimization procedure is employed to search for the optimal individual. (b) Offspring generation: Offspring individuals are produced using several LLM-based evolutionary prompt strategies. (c) Fitness evaluation: An NCO model assesses the performance of these newly generated individuals. (d) Population update: The highest-performing individuals are then selected to constitute the succeeding generation, and this iterative process repeats until the specified termination criteria are satisfied.
  • Figure 3: The solution visualizations of a TSP5K instance with uniform distribution.
  • Figure 4: The solution visualizations of a TSP5K instance with cluster distribution.
  • Figure 5: The solution visualizations of a TSP5K instance with explosion distribution.
  • ...and 2 more figures