LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

Niki van Stein; Thomas Bäck

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

Niki van Stein, Thomas Bäck

TL;DR

This work addresses automatic design of metaheuristics for continuous black-box optimization by coupling Large Language Models with an evolutionary loop (LLaMEA). It evaluates candidate metaheuristics on the BBOB benchmark via IOHprofiler, using the AOCC metric to guide iterative improvement, and compares multiple LLMs and strategies against strong baselines. The results show that LLaMEA-accelerated algorithms, particularly those from GPT-4 with selective mutation, can outperform state-of-the-art baselines like CMA-ES and DE on 5-dimensional problems, while highlighting challenges in scaling to higher dimensions. The study demonstrates the practicality of automated algorithm design with LLMs and outlines directions for more scalable, diverse, and robust auto-design frameworks with potential broad impact on optimization practice.

Abstract

Large Language Models (LLMs) such as GPT-4 have demonstrated their ability to understand natural language and generate complex code snippets. This paper introduces a novel Large Language Model Evolutionary Algorithm (LLaMEA) framework, leveraging GPT models for the automated generation and refinement of algorithms. Given a set of criteria and a task definition (the search space), LLaMEA iteratively generates, mutates and selects algorithms based on performance metrics and feedback from runtime evaluations. This framework offers a unique approach to generating optimized algorithms without requiring extensive prior expertise. We show how this framework can be used to generate novel black-box metaheuristic optimization algorithms automatically. LLaMEA generates multiple algorithms that outperform state-of-the-art optimization algorithms (Covariance Matrix Adaptation Evolution Strategy and Differential Evolution) on the five dimensional black box optimization benchmark (BBOB). The algorithms also show competitive performance on the 10- and 20-dimensional instances of the test functions, although they have not seen such instances during the automated generation process. The results demonstrate the feasibility of the framework and identify future directions for automated generation and optimization of algorithms via LLMs.

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

TL;DR

Abstract

Paper Structure (23 sections, 7 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 7 equations, 17 figures, 3 tables, 1 algorithm.

Introduction
Related Work
LLaMEA
Starting Prompt
Algorithm Synthesis (Initialization)
Evaluation
Mutation, Selection and Feedback
Experimental Setup
Large Language Models
Benchmark Problems
Performance Metrics
Baselines
Results and Discussion
Novelty and diversity
Mutation Rates
...and 8 more sections

Figures (17)

Figure 1: The summary of the proposed LLM driven algorithm design framework LLaMEA. Full details of all steps are provided in the corresponding sections.
Figure 2: Mean convergence curves (best-so-far algorithm scores) over the $5$ different runs for each LLM and selection strategy. Shaded areas denote the standard deviation of the best-so-far. Please note that the difference in initial performance already results from the fact that, although the starting prompt $S$ is identical for all LLMs, the performance value shown here is the mean $AOCC$ value of the first algorithm $a_1$ generated by each LLM in line 4 of Algorithm \ref{['alg3']}. Notice that only best-so-far values are plotted, also for (1,1)-strategies, and infeasible algorithm results are also not plotted (as they would have an $AOCC = 0$ value as mentioned in Algorithm \ref{['alg3']}).
Figure 3: Mean convergence curves (best-so-far algorithm scores) over the $5$ different runs for the best strategy LLaMEA-variant, i.e., LLaMEA-(1+1) GPT-4 (same curve as in Fig. \ref{['fig:convergence']}), including the state-of-the-art baseline EoH algorithm (red) and the random search (RS) baseline (black). Shaded areas denote the standard deviation of the best-so-far over $5$ runs. Additional remarks as provided in the caption of Fig. \ref{['fig:convergence']} apply here, too.
Figure 4: Word cloud of algorithm name parts generated over all different LLaMEA runs.
Figure 5: Pairwise differences between parent and offspring for each iteration. Solid lines represent the mean over all runs per model and strategy, more transparent lines are individual runs.w/ Details detnotes that we use a feedback mechanism that provided not just the plain average $AOCC$ but also the average $AOCC$ per BBOB function group as feedback to the LLM.
...and 12 more figures

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

TL;DR

Abstract

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

Authors

TL;DR

Abstract

Table of Contents

Figures (17)