Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models
Rui Zhang, Fei Liu, Xi Lin, Zhenkun Wang, Zhichao Lu, Qingfu Zhang
TL;DR
The paper investigates whether large language models (LLMs) alone can autonomously design effective heuristics for Automated Heuristic Design (AHD) or whether coupling LLMs with an evolutionary search (EPS) process is essential. It introduces a large-scale benchmark encompassing four AHD problems, four LLM-based EPS methods plus a simple ($1+1$-EPS) baseline, nine LLMs, and five runs, to analyze the necessity of search and the progress of LLM-based EPS. Key findings show that standalone LLMs, even with large budgets or high capacity, underperform compared with search-augmented approaches, and that EPS methods exhibit problem- and model-dependent performance with no universally best method. The work highlights substantial search costs and variability across tasks, advocating for more diverse benchmarks and open-source reproducibility to guide future EPS algorithm development and LLM usage in AHD applications.
Abstract
Automated heuristic design (AHD) has gained considerable attention for its potential to automate the development of effective heuristics. The recent advent of large language models (LLMs) has paved a new avenue for AHD, with initial efforts focusing on framing AHD as an evolutionary program search (EPS) problem. However, inconsistent benchmark settings, inadequate baselines, and a lack of detailed component analysis have left the necessity of integrating LLMs with search strategies and the true progress achieved by existing LLM-based EPS methods to be inadequately justified. This work seeks to fulfill these research queries by conducting a large-scale benchmark comprising four LLM-based EPS methods and four AHD problems across nine LLMs and five independent runs. Our extensive experiments yield meaningful insights, providing empirical grounding for the importance of evolutionary search in LLM-based AHD approaches, while also contributing to the advancement of future EPS algorithmic development. To foster accessibility and reproducibility, we have fully open-sourced our benchmark and corresponding results.
