Table of Contents
Fetching ...

LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization Algorithms

Wenhu Li, Niki van Stein, Thomas Bäck, Elena Raponi

TL;DR

This work automates the design of Bayesian optimization algorithms by extending the LLaMEA framework to BO, using an evolution-driven prompting loop to generate complete Python implementations embedded in a BO template. Evaluated on the COCO BBOB benchmarks and Bayesmark, the best LLaMEA-BO algorithms match or exceed state-of-the-art baselines across multiple dimensions and tasks, demonstrating robust generalization to unseen settings. The results illustrate that LLMs can serve as algorithmic co-designers, accelerating discovery of novel BO configurations without additional fine-tuning. The study also discusses limitations and future directions, including diversification of prompts and broader benchmark incorporation.

Abstract

Bayesian optimization (BO) is a powerful class of algorithms for optimizing expensive black-box functions, but designing effective BO algorithms remains a manual, expertise-driven task. Recent advancements in Large Language Models (LLMs) have opened new avenues for automating scientific discovery, including the automatic design of optimization algorithms. While prior work has used LLMs within optimization loops or to generate non-BO algorithms, we tackle a new challenge: Using LLMs to automatically generate full BO algorithm code. Our framework uses an evolution strategy to guide an LLM in generating Python code that preserves the key components of BO algorithms: An initial design, a surrogate model, and an acquisition function. The LLM is prompted to produce multiple candidate algorithms, which are evaluated on the established Black-Box Optimization Benchmarking (BBOB) test suite from the COmparing Continuous Optimizers (COCO) platform. Based on their performance, top candidates are selected, combined, and mutated via controlled prompt variations, enabling iterative refinement. Despite no additional fine-tuning, the LLM-generated algorithms outperform state-of-the-art BO baselines in 19 (out of 24) BBOB functions in dimension 5 and generalize well to higher dimensions, and different tasks (from the Bayesmark framework). This work demonstrates that LLMs can serve as algorithmic co-designers, offering a new paradigm for automating BO development and accelerating the discovery of novel algorithmic combinations. The source code is provided at https://github.com/Ewendawi/LLaMEA-BO.

LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization Algorithms

TL;DR

This work automates the design of Bayesian optimization algorithms by extending the LLaMEA framework to BO, using an evolution-driven prompting loop to generate complete Python implementations embedded in a BO template. Evaluated on the COCO BBOB benchmarks and Bayesmark, the best LLaMEA-BO algorithms match or exceed state-of-the-art baselines across multiple dimensions and tasks, demonstrating robust generalization to unseen settings. The results illustrate that LLMs can serve as algorithmic co-designers, accelerating discovery of novel BO configurations without additional fine-tuning. The study also discusses limitations and future directions, including diversification of prompts and broader benchmark incorporation.

Abstract

Bayesian optimization (BO) is a powerful class of algorithms for optimizing expensive black-box functions, but designing effective BO algorithms remains a manual, expertise-driven task. Recent advancements in Large Language Models (LLMs) have opened new avenues for automating scientific discovery, including the automatic design of optimization algorithms. While prior work has used LLMs within optimization loops or to generate non-BO algorithms, we tackle a new challenge: Using LLMs to automatically generate full BO algorithm code. Our framework uses an evolution strategy to guide an LLM in generating Python code that preserves the key components of BO algorithms: An initial design, a surrogate model, and an acquisition function. The LLM is prompted to produce multiple candidate algorithms, which are evaluated on the established Black-Box Optimization Benchmarking (BBOB) test suite from the COmparing Continuous Optimizers (COCO) platform. Based on their performance, top candidates are selected, combined, and mutated via controlled prompt variations, enabling iterative refinement. Despite no additional fine-tuning, the LLM-generated algorithms outperform state-of-the-art BO baselines in 19 (out of 24) BBOB functions in dimension 5 and generalize well to higher dimensions, and different tasks (from the Bayesmark framework). This work demonstrates that LLMs can serve as algorithmic co-designers, offering a new paradigm for automating BO development and accelerating the discovery of novel algorithmic combinations. The source code is provided at https://github.com/Ewendawi/LLaMEA-BO.

Paper Structure

This paper contains 45 sections, 3 equations, 25 figures, 3 tables, 2 algorithms.

Figures (25)

  • Figure 1: LLaMEA-BO's performance on selected BBOB functions in terms of AOCC and generated algorithm loss over time. Shaded areas denote the standard error.
  • Figure 2: Best algorithm evaluation based on AOCC: Violin plots aggregating over 24 functions, 3 instances, 5 runs.
  • Figure 3: Hyperparameter tuning task results. Performance is reported in terms of regret and separately for Bayesmark public datasets (top row) and synthetic tasks (bottom row). Convergence curves are aggregated over $5$ independent runs. Solid lines correspond to our generated algorithms. All algorithms are initialized from the same $5$ samples, whose regret is not reported in the plots.
  • Figure 4: Results from different population sizes and elitism configurations averaged over $10$ BBOB functions and $5$ repetitions per function.
  • Figure 5: Results from different crossover rate $[0.3, 0.6, 0.9]$ configurations (using a $(4+8)$ ES strategy) averaged over $10$ BBOB functions and $4$ repetitions per function.
  • ...and 20 more figures