Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
Rongjie Zhu, Cong Zhang, Zhiguang Cao
TL;DR
The paper addresses solving CVRP by embedding a small, specialized LLM within an optimization solver. It introduces RFTHGS, an RL-based fine-tuning framework that generates high-performance crossover operators for the Hybrid Genetic Search, guided by a curriculum reward and anti-plagiarism caching. Empirical results show that the LLM-generated operators outperform expert-designed components across CVRPLIB benchmarks, generalizing to problems with up to 1000 nodes and surpassing prompting-based and GPT-based baselines. This work demonstrates that a compact, task-focused LLM can exceed handcrafted solver components, suggesting a viable path for automated design of solver primitives in complex COPs.
Abstract
While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a small LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.
