Table of Contents
Fetching ...

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre

TL;DR

EvoTune addresses the challenge of discovering high-quality algorithms by bridging evolutionary search with reinforcement-learning fine-tuning of the LLM. By using evolutionary exploration to generate data and RL to update the LLM policy, EvoTune accelerates progress beyond static-generation baselines, while forward KL regularization maintains output diversity critical for exploration. Across bin packing, traveling salesman, flatpack, and broader Hash Code and LLM-SR benchmarks, EvoTune yields higher top performance and more unique solutions, often outperforming human heuristics and non-LLM baselines. The work demonstrates the viability and potential of RL-enhanced evolutionary strategies for automated algorithm design, with implications for scalable, data-efficient discovery in combinatorial optimization and beyond.

Abstract

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on combinatorial optimization tasks demonstrate that integrating RL with evolutionary search accelerates the discovery of superior algorithms, showcasing the potential of RL-enhanced evolutionary strategies for algorithm design.

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

TL;DR

EvoTune addresses the challenge of discovering high-quality algorithms by bridging evolutionary search with reinforcement-learning fine-tuning of the LLM. By using evolutionary exploration to generate data and RL to update the LLM policy, EvoTune accelerates progress beyond static-generation baselines, while forward KL regularization maintains output diversity critical for exploration. Across bin packing, traveling salesman, flatpack, and broader Hash Code and LLM-SR benchmarks, EvoTune yields higher top performance and more unique solutions, often outperforming human heuristics and non-LLM baselines. The work demonstrates the viability and potential of RL-enhanced evolutionary strategies for automated algorithm design, with implications for scalable, data-efficient discovery in combinatorial optimization and beyond.

Abstract

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on combinatorial optimization tasks demonstrate that integrating RL with evolutionary search accelerates the discovery of superior algorithms, showcasing the potential of RL-enhanced evolutionary strategies for algorithm design.

Paper Structure

This paper contains 59 sections, 4 equations, 11 figures, 5 tables, 2 algorithms.

Figures (11)

  • Figure 1: Method overview:EvoTune iteratively alternates between two phases: (a) evolutionary search that iteratively improves solutions by bootstrapping from the best ones discovered so far, and (b) RL training, which updates the model parameters based on information gained from the search process. In this loop, evolutionary search is used to explore the space of programs efficiently and collect data, and RL is used to improve the policy based on the data generated with evolutionary search. Python programs generated by an LLM are evaluated on a set of combinatorial optimization problem instances and then stored in a program database for later use in RL training and prompt construction.
  • Figure 2: Top-50 rewards and the number of unique scores. The reward score of the best 50 generated programs (Top) and the number of programs with unique scores across different models Bottom for (a) flatpack, (b) bin packing, and (c) traveling salesman problem. The shaded areas denote the standard error computed over 10 seeds. Across all models and tasks, EvoTune finds higher-scoring best 50 programs. Additionally, it finds a greater number of uniquely scoring solutions.
  • Figure 3: (a) Evolution of optimality gap distributions. Histograms illustrating the distribution of optimality gap scores for programs in the program database at an early checkpoint with limited sampling budget (Left) and at the final checkpoint with full sampling budget (Right). The Top, Middle, and Bottom rows show results for the BP, TSP, and FP tasks, respectively. All results are averaged over 10 seeds. Throughout the search process, EvoTune produces a higher number of high-quality programs (indicated by lower optimality gap scores) compared to the baseline. (b) Forward KL vs. Reverse KL. Comparison of KL variants based on the reward of the top 50 programs (Top) and the number of unique scores (Bottom). Forward KL yields higher rewards and a higher number of unique solutions, which we attribute to a higher diversity of outputs.
  • Figure 4: Results on two Hashcode problems, and two problems from LLM-SR. EvoTune consistently outperforms FunSearch across all problems in terms of discovering higher scoring best 50 solutions and higher diversity as measured by number of discovered solutions with unique scores. Shaded regions show standard deviation across 4 seeds.
  • Figure 5: Average reward scores of valid sampled programs. The shaded areas represent the standard error over 10 seeds. Our method outperforms the baseline in terms of its outputs having a better average score.
  • ...and 6 more figures