Table of Contents
Fetching ...

Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

Rongjie Zhu, Cong Zhang, Zhiguang Cao

TL;DR

The paper addresses solving CVRP by embedding a small, specialized LLM within an optimization solver. It introduces RFTHGS, an RL-based fine-tuning framework that generates high-performance crossover operators for the Hybrid Genetic Search, guided by a curriculum reward and anti-plagiarism caching. Empirical results show that the LLM-generated operators outperform expert-designed components across CVRPLIB benchmarks, generalizing to problems with up to 1000 nodes and surpassing prompting-based and GPT-based baselines. This work demonstrates that a compact, task-focused LLM can exceed handcrafted solver components, suggesting a viable path for automated design of solver primitives in complex COPs.

Abstract

While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a small LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.

Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

TL;DR

The paper addresses solving CVRP by embedding a small, specialized LLM within an optimization solver. It introduces RFTHGS, an RL-based fine-tuning framework that generates high-performance crossover operators for the Hybrid Genetic Search, guided by a curriculum reward and anti-plagiarism caching. Empirical results show that the LLM-generated operators outperform expert-designed components across CVRPLIB benchmarks, generalizing to problems with up to 1000 nodes and surpassing prompting-based and GPT-based baselines. This work demonstrates that a compact, task-focused LLM can exceed handcrafted solver components, suggesting a viable path for automated design of solver primitives in complex COPs.

Abstract

While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a small LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.

Paper Structure

This paper contains 23 sections, 4 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: The reinforcement learning pipeline of RFTHGS. The framework iteratively optimizes an LLM to generate effective crossover operators for HGS. Each iteration consists of generating code from a structured prompt, evaluating the operator's performance on a validation set (using incremental compilation for speed), calculating a multi-faceted reward, and updating the LLM policy. The LLM only sees operator examples, not problem instances or the solver codebase.
  • Figure 2: (a) HGS as the environment for evaluating the quality of LLM-generated operators. We use the incremental compilation technique to boost the computation of objective values. (b) The multi-faceted reward function.
  • Figure 3: Training dynamics of the RFTHGS framework. (a) Average reward per step, showing stable convergence. (b) Evolution of the reward distribution, illustrating the effectiveness of the multi-faceted reward function in guiding the learning process.