Table of Contents
Fetching ...

Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation

Jianghan Zhu, Yaoxin Wu, Zhuoyi Lin, Zhengyuan Zhang, Haiyan Yin, Zhiguang Cao, Senthilnath Jayavelu, Xiaoli Li

TL;DR

EvoReal tackles the generalization gap of neural VRP solvers when moving from synthetic, uniform data to real-world VRP benchmarks (TSPLib and CVRPLib). It introduces an LLM-guided evolutionary framework that designs and evolves data generators to produce structurally realistic VRP instances, followed by a two-phase progressive fine-tuning of pre-trained neural solvers to align with real distributions and scales. The approach yields state-of-the-art generalization across problem sizes, significantly reducing gaps to optimal on TSPLib ($1.05\%$) and CVRPLib ($2.71\%$), without changing model architectures. This data-centric method demonstrates the power of leveraging LLMs for distributional alignment and offers a practical pathway to deploying neural VRP solvers in real-world settings.

Abstract

Recent advances in Neural Combinatorial Optimization (NCO) methods have significantly improved the capability of neural solvers to handle synthetic routing instances. Nonetheless, existing neural solvers typically struggle to generalize effectively from synthetic, uniformly-distributed training data to real-world VRP scenarios, including widely recognized benchmark instances from TSPLib and CVRPLib. To bridge this generalization gap, we present Evolutionary Realistic Instance Synthesis (EvoReal), which leverages an evolutionary module guided by large language models (LLMs) to generate synthetic instances characterized by diverse and realistic structural patterns. Specifically, the evolutionary module produces synthetic instances whose structural attributes statistically mimics those observed in authentic real-world instances. Subsequently, pre-trained NCO models are progressively refined, firstly aligning them with these structurally enriched synthetic distributions and then further adapting them through direct fine-tuning on actual benchmark instances. Extensive experimental evaluations demonstrate that EvoReal markedly improves the generalization capabilities of state-of-the-art neural solvers, yielding a notable reduced performance gap compared to the optimal solutions on the TSPLib (1.05%) and CVRPLib (2.71%) benchmarks across a broad spectrum of problem scales.

Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation

TL;DR

EvoReal tackles the generalization gap of neural VRP solvers when moving from synthetic, uniform data to real-world VRP benchmarks (TSPLib and CVRPLib). It introduces an LLM-guided evolutionary framework that designs and evolves data generators to produce structurally realistic VRP instances, followed by a two-phase progressive fine-tuning of pre-trained neural solvers to align with real distributions and scales. The approach yields state-of-the-art generalization across problem sizes, significantly reducing gaps to optimal on TSPLib () and CVRPLib (), without changing model architectures. This data-centric method demonstrates the power of leveraging LLMs for distributional alignment and offers a practical pathway to deploying neural VRP solvers in real-world settings.

Abstract

Recent advances in Neural Combinatorial Optimization (NCO) methods have significantly improved the capability of neural solvers to handle synthetic routing instances. Nonetheless, existing neural solvers typically struggle to generalize effectively from synthetic, uniformly-distributed training data to real-world VRP scenarios, including widely recognized benchmark instances from TSPLib and CVRPLib. To bridge this generalization gap, we present Evolutionary Realistic Instance Synthesis (EvoReal), which leverages an evolutionary module guided by large language models (LLMs) to generate synthetic instances characterized by diverse and realistic structural patterns. Specifically, the evolutionary module produces synthetic instances whose structural attributes statistically mimics those observed in authentic real-world instances. Subsequently, pre-trained NCO models are progressively refined, firstly aligning them with these structurally enriched synthetic distributions and then further adapting them through direct fine-tuning on actual benchmark instances. Extensive experimental evaluations demonstrate that EvoReal markedly improves the generalization capabilities of state-of-the-art neural solvers, yielding a notable reduced performance gap compared to the optimal solutions on the TSPLib (1.05%) and CVRPLib (2.71%) benchmarks across a broad spectrum of problem scales.

Paper Structure

This paper contains 28 sections, 1 equation, 15 figures, 9 tables, 1 algorithm.

Figures (15)

  • Figure 1: Overall workflow of EvoReal including LLM-guided generator evolution and progressive fine-tuning. Top Left: Validation set and unseen test set are split, with validation problems grouped by distribution category for structural-specific generator design. Top Right: LLM-driven module evolves generators which are evaluated on specific validation sets. Bottom: Pre-trained models are progressively fine-tuned on data from the evolved generators and the validation set’s real instances.
  • Figure 2: LLM-driven evolution component in Fig.\ref{['fig:workflow_main']}. The pipeline within the blue rectangle block is repeated for $N$iterations. Dotted arrows represent the proxy-evaluation of each generator; black arrows indicate the flow of generators. For each pair of parents, the reflector LLM performs short-term reflection based on their relative performance, and this insight is used to guide crossover for designing new offspring. Accumulated short-term reflections are further distilled into long-term ones, which guide mutation to improve the current best generator. After mutation, populations are ranked and selected to maintain a fixed size.
  • Figure 3: Comparison of the performance of the evolved generator with five the naive-distribution generators.
  • Figure 4: Left: threshold for FFT energy of S1 type and non-S1 type. Right: threshold for NN-ratio of S2 type and S3 type. The thresholds of the divisions are both marked with black dotted line.
  • Figure 5: Box-plots of FFT energy and NN-ratio of all validation instances after instance segmentation. The corresponding statistics of 5000 TSP 100 instances sampled from uniform distribution is supplemented for comparison.
  • ...and 10 more figures