Table of Contents
Fetching ...

HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Xianliang Yang, Ling Zhang, Haolong Qian, Lei Song, Jiang Bian

TL;DR

HeurAgenix addresses the adaptability gap in combinatorial optimization by introducing a two-stage, LLM-driven framework that first evolves a diverse pool of heuristics and then adaptively selects among them at test time. The evolution phase uses contrastive analysis and an LLM to extract reusable improvement strategies, while the problem-solving phase employs a lightweight, fine-tunable selector with a dual-reward scheme (POR and CPR) and test-time scaling to robustly pick heuristics under noisy supervision. Empirical results across five canonical CO benchmarks show that HeurAgenix outperforms existing LLM-based hyper-heuristics and matches or exceeds specialized solvers, with a GitHub implementation provided. The work contributes a fully end-to-end, data-driven workflow for autonomous heuristic design and adaptive selection, highlighting practical impact for scalable CO solving in diverse domains.

Abstract

Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce \textbf{HeurAgenix}, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heuristic evolution phase, HeurAgenix leverages an LLM to compare seed heuristic solutions with higher-quality solutions and extract reusable evolution strategies. During problem solving, it dynamically picks the most promising heuristic for each problem state, guided by the LLM's perception ability. For flexibility, this selector can be either a state-of-the-art LLM or a fine-tuned lightweight model with lower inference cost. To mitigate the scarcity of reliable supervision caused by CO complexity, we fine-tune the lightweight heuristic selector with a dual-reward mechanism that jointly exploits singals from selection preferences and state perception, enabling robust selection under noisy annotations. Extensive experiments on canonical benchmarks show that HeurAgenix not only outperforms existing LLM-based hyper-heuristics but also matches or exceeds specialized solvers. Code is available at https://github.com/microsoft/HeurAgenix.

HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

TL;DR

HeurAgenix addresses the adaptability gap in combinatorial optimization by introducing a two-stage, LLM-driven framework that first evolves a diverse pool of heuristics and then adaptively selects among them at test time. The evolution phase uses contrastive analysis and an LLM to extract reusable improvement strategies, while the problem-solving phase employs a lightweight, fine-tunable selector with a dual-reward scheme (POR and CPR) and test-time scaling to robustly pick heuristics under noisy supervision. Empirical results across five canonical CO benchmarks show that HeurAgenix outperforms existing LLM-based hyper-heuristics and matches or exceeds specialized solvers, with a GitHub implementation provided. The work contributes a fully end-to-end, data-driven workflow for autonomous heuristic design and adaptive selection, highlighting practical impact for scalable CO solving in diverse domains.

Abstract

Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce \textbf{HeurAgenix}, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heuristic evolution phase, HeurAgenix leverages an LLM to compare seed heuristic solutions with higher-quality solutions and extract reusable evolution strategies. During problem solving, it dynamically picks the most promising heuristic for each problem state, guided by the LLM's perception ability. For flexibility, this selector can be either a state-of-the-art LLM or a fine-tuned lightweight model with lower inference cost. To mitigate the scarcity of reliable supervision caused by CO complexity, we fine-tune the lightweight heuristic selector with a dual-reward mechanism that jointly exploits singals from selection preferences and state perception, enabling robust selection under noisy annotations. Extensive experiments on canonical benchmarks show that HeurAgenix not only outperforms existing LLM-based hyper-heuristics but also matches or exceeds specialized solvers. Code is available at https://github.com/microsoft/HeurAgenix.

Paper Structure

This paper contains 43 sections, 10 equations, 9 figures, 6 tables, 3 algorithms.

Figures (9)

  • Figure 1: Overview of the HeurAgenix framework for automatic heuristic design and adaptive selection. In the heuristic evolution phase, an LLM autonomously discovers evolution strategies by analyzing contrastive solution tuples, while in the problem solving phase, an adaptive heuristic selection mechanism integrates Test-time Scaling (TTS) wei2022chainwang2022self.
  • Figure 2: Illustration of one heuristic-evolution step on a four-node TSP evolution instance. The cumulative effect of this and subsequent refinements can be seen in Figure \ref{['fig:evolution_example']}.
  • Figure 3: Example of heuristic evolution for TSP. The left panel illustrates successive strategy refinements and their impact on TSPLIB reinelt1991tsplib performance. The right panel details a specific evolution step, where an alternative cost function is induced by the LLM based on counterfactual analysis. For a step-by-step extraction of the refinement highlighted in Round 2, see Figure \ref{['fig:single_evolution_example']} and for further details and code, see Appendix \ref{['sec:heuristic_evolution_example']}.
  • Figure 4: Detailed reward design, showing the operational mechanisms of the novel POR and CPR as well as auxiliary Format Reward grpo and Language Rewards deepseek2025.
  • Figure 5: Effect of noisy rollout data on heuristic selection (rd100 in TSPLIB reinelt1991tsplib). Y-axis: expected optimality gap (lower is better) after completing the tour by random sampling. X-axis: decision rounds. Blue: always selecting the best heuristic (oracle). Green: uniformly selecting from the top 30% heuristics (positive set). Red: random selection. Selecting from the positive set almost matches the oracle and clearly outperforms random choice.
  • ...and 4 more figures