Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning
Yuanyao Chen, Rongsheng Chen, Fu Luo, Zhenkun Wang
TL;DR
This work tackles the zero-shot generalization gap of neural combinatorial optimization for large-scale VRPs by introducing TTPL, an LLM-driven test-time projection learning framework. TTPL learns projection strategies during inference to align training and testing distributions, enabling a backbone model trained on 100-node instances to solve up to 100K-node TSP and CVRP instances, without retraining. Enhancing robustness, the authors add Multi-View Decision Fusion (MVDF), which enforces transformation-invariant features by aggregating multiple subgraph perspectives. Extensive experiments on synthetic and real-world benchmarks demonstrate substantial gains over strong baselines, with ablations validating the projection and MVDF components, and versatility studies showing applicability across models and distributions. The work advances practical large-scale VRP solving by reducing reliance on manual redesign and costly retraining, though it notes slower convergence during LLM-driven optimization as an area for future work.
Abstract
Neural Combinatorial Optimization (NCO) has emerged as a promising learning-based paradigm for addressing Vehicle Routing Problems (VRPs) by minimizing the need for extensive manual engineering. While existing NCO methods, trained on small-scale instances (e.g., 100 nodes), have demonstrated considerable success on problems of similar scale, their performance significantly degrades when applied to large-scale scenarios. This degradation arises from the distributional shift between training and testing data, rendering policies learned on small instances ineffective for larger problems. To overcome this limitation, we introduce a novel learning framework driven by Large Language Models (LLMs). This framework learns a projection between the training and testing distributions, which is then deployed to enhance the scalability of the NCO model. Notably, unlike prevailing techniques that necessitate joint training with the neural network, our approach operates exclusively during the inference phase, obviating the need for model retraining. Extensive experiments demonstrate that our method enables a backbone model (trained on 100-node instances) to achieve superior performance on large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) of up to 100K nodes from diverse distributions.
