Table of Contents
Fetching ...

ImprovEvolve: Ask AlphaEvolve to Improve the Input Solution and Then Improvise

Alexey Kravatskiy, Valentin Khrulkov, Ivan Oseledets

TL;DR

ImprovEvolve tackles the challenge of leveraging LLMs for difficult mathematics-driven optimization by decomposing the search into a modular class with generate_config, improve, and perturb, and by embedding this in a basin-hopping framework. Using two-stage validation and a MAP-Elites backbone, it demonstrates strong performance on rugged problems, achieving state-of-the-art hexagon packings for multiple sizes and improving the second-autocorrelation lower bound when resuming from AlphaEvolve solutions with expert edits. The results highlight the value of task decomposition and human–AI collaboration in mathematical discovery, offering a scalable blueprint for applying LLM-guided optimization to other complex domains. While promising, the work acknowledges limitations in generalizability and the need for careful hyperparameter and edit management, pointing toward future agentic and collaborative workflows that blend AI automation with domain expertise.

Abstract

Recent advances in LLM-guided evolutionary computation, particularly AlphaEvolve, have demonstrated remarkable success in discovering novel mathematical constructions and solving challenging optimization problems. In this article, we present ImprovEvolve, a simple yet effective technique for enhancing LLM-based evolutionary approaches such as AlphaEvolve. Given an optimization problem, the standard approach is to evolve program code that, when executed, produces a solution close to the optimum. We propose an alternative program parameterization that maintains the ability to construct optimal solutions while reducing the cognitive load on the LLM. Specifically, we evolve a program (implementing, e.g., a Python class with a prescribed interface) that provides the following functionality: (1) propose a valid initial solution, (2) improve any given solution in terms of fitness, and (3) perturb a solution with a specified intensity. The optimum can then be approached by iteratively applying improve() and perturb() with a scheduled intensity. We evaluate ImprovEvolve on challenging problems from the AlphaEvolve paper: hexagon packing in a hexagon and the second autocorrelation inequality. For hexagon packing, the evolved program achieves new state-of-the-art results for 11, 12, 15, and 16 hexagons; a lightly human-edited variant further improves results for 14, 17, and 23 hexagons. For the second autocorrelation inequality, the human-edited program achieves a new state-of-the-art lower bound of 0.96258, improving upon AlphaEvolve's 0.96102.

ImprovEvolve: Ask AlphaEvolve to Improve the Input Solution and Then Improvise

TL;DR

ImprovEvolve tackles the challenge of leveraging LLMs for difficult mathematics-driven optimization by decomposing the search into a modular class with generate_config, improve, and perturb, and by embedding this in a basin-hopping framework. Using two-stage validation and a MAP-Elites backbone, it demonstrates strong performance on rugged problems, achieving state-of-the-art hexagon packings for multiple sizes and improving the second-autocorrelation lower bound when resuming from AlphaEvolve solutions with expert edits. The results highlight the value of task decomposition and human–AI collaboration in mathematical discovery, offering a scalable blueprint for applying LLM-guided optimization to other complex domains. While promising, the work acknowledges limitations in generalizability and the need for careful hyperparameter and edit management, pointing toward future agentic and collaborative workflows that blend AI automation with domain expertise.

Abstract

Recent advances in LLM-guided evolutionary computation, particularly AlphaEvolve, have demonstrated remarkable success in discovering novel mathematical constructions and solving challenging optimization problems. In this article, we present ImprovEvolve, a simple yet effective technique for enhancing LLM-based evolutionary approaches such as AlphaEvolve. Given an optimization problem, the standard approach is to evolve program code that, when executed, produces a solution close to the optimum. We propose an alternative program parameterization that maintains the ability to construct optimal solutions while reducing the cognitive load on the LLM. Specifically, we evolve a program (implementing, e.g., a Python class with a prescribed interface) that provides the following functionality: (1) propose a valid initial solution, (2) improve any given solution in terms of fitness, and (3) perturb a solution with a specified intensity. The optimum can then be approached by iteratively applying improve() and perturb() with a scheduled intensity. We evaluate ImprovEvolve on challenging problems from the AlphaEvolve paper: hexagon packing in a hexagon and the second autocorrelation inequality. For hexagon packing, the evolved program achieves new state-of-the-art results for 11, 12, 15, and 16 hexagons; a lightly human-edited variant further improves results for 14, 17, and 23 hexagons. For the second autocorrelation inequality, the human-edited program achieves a new state-of-the-art lower bound of 0.96258, improving upon AlphaEvolve's 0.96102.
Paper Structure (34 sections, 4 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 4 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of AlphaEvolve and ImprovEvolve.Top: In AlphaEvolve, the LLM evolves a program that directly outputs a candidate solution, requiring the model to design an end-to-end optimization algorithm including initialization, search strategy, and termination criteria. Bottom: In ImprovEvolve, the LLM evolves a class with three modular methods---generate_config (initialization), improve (local optimization), and perturb (exploration)---which are combined via basin-hopping dynamics with a scheduled perturbation intensity. This decomposition reduces the cognitive burden on the LLM by separating distinct optimization concerns into tractable subproblems. The pseudocode shown for ImprovEvolve is simplified; see Algorithm \ref{['alg:validation']} for the full description.
  • Figure 2: Illustration of the two-stage validation scheme (Algorithm \ref{['alg:validation']}) on the HEX 17 problem. Stage A (top):generate proposes initial configurations that typically contain overlapping hexagons (invalid); improve resolves these overlaps and produces valid packings. The best result ($L^* = 4.619$, gold border) is selected. Stage B (bottom): basin-hopping alternates perturb (red border) and improve. Even an extreme perturbation ($\sigma = 316$, round 4) that scatters hexagons across a vast enclosing hexagon ($L = 54.5$) is recovered by improve back to $L = 4.619$. The breakthrough occurs at round 8 ($\sigma = 56.2$), where improve discovers a new, tighter basin at $L = 4.6136$---a structural improvement inaccessible from the Stage A initialization. Subsequent rounds with small $\sigma$ confirm stability.
  • Figure 3: Two packings of $n = 11$ unit hexagons. (a) The previous state-of-the-art by AlphaEvolvenovikov2025alphaevolve. (b) A structurally different packing found by ImprovEvolve that improves upon AlphaEvolve but is not the overall best (cf. \ref{['fig:hex11']}, $L = 3.9245$).
  • Figure 4: State-of-the-art hexagon packings discovered by ImprovEvolve. New best-known configurations are shown for $n = 11$--$17$ and $n = 23$. For $n = 25$--$30$, no prior packings have been reported; the discovered configurations exhibit structured, non-chaotic arrangements. The $n = 30$ packing (shown) achieves the same side length $L = 6.0$ as $n = 29$: any single hexagon can be removed to obtain a valid $n = 29$ packing with identical $L$. Side lengths are listed in \ref{['tab:new_sota_transposed']} and \ref{['tab:hex_large_scale']}.
  • Figure 5: Distribution of validation fitness values ($-L$) across all valid programs produced during evolution for the $n = 11$ hexagon packing problem. ImprovEvolve consistently yields higher-fitness programs compared to the GigaEvo baseline under identical evolutionary parameters and time limits.
  • ...and 4 more figures