Table of Contents
Fetching ...

Large Language Models As Evolution Strategies

Robert Tjarko Lange, Yingtao Tian, Yujin Tang

TL;DR

This work studies the zero-shot optimization of a black-box function $f:\mathbb{R}^D\to\mathbb{R}$ using large language models.It introduces EvoLLM, which prompts the LLM to act as the ES update operator by using a discretized, fitness-sorted history $H$ to propose a new mean $x^\star$.The paper shows EvoLLM outperforms baselines on BBOB and neuroevolution tasks, with ablations clarifying the importance of discretization and prompt design, and demonstrates gains from instruction fine-tuning on teacher trajectories.This suggests that text-trained LLMs can serve as plug-in, in-context recombination operators for derivative-free optimization, with scalable block-wise querying and potential for distilling teacher strategies.

Abstract

Large Transformer models are capable of implementing a plethora of so-called in-context learning algorithms. These include gradient descent, classification, sequence completion, transformation, and improvement. In this work, we investigate whether large language models (LLMs), which never explicitly encountered the task of black-box optimization, are in principle capable of implementing evolutionary optimization algorithms. While previous works have solely focused on language-based task specification, we move forward and focus on the zero-shot application of LLMs to black-box optimization. We introduce a novel prompting strategy, consisting of least-to-most sorting of discretized population members and querying the LLM to propose an improvement to the mean statistic, i.e. perform a type of black-box recombination operation. Empirically, we find that our setup allows the user to obtain an LLM-based evolution strategy, which we call `EvoLLM', that robustly outperforms baseline algorithms such as random search and Gaussian Hill Climbing on synthetic BBOB functions as well as small neuroevolution tasks. Hence, LLMs can act as `plug-in' in-context recombination operators. We provide several comparative studies of the LLM's model size, prompt strategy, and context construction. Finally, we show that one can flexibly improve EvoLLM's performance by providing teacher algorithm information via instruction fine-tuning on previously collected teacher optimization trajectories.

Large Language Models As Evolution Strategies

TL;DR

This work studies the zero-shot optimization of a black-box function $f:\mathbb{R}^D\to\mathbb{R}$ using large language models.It introduces EvoLLM, which prompts the LLM to act as the ES update operator by using a discretized, fitness-sorted history $H$ to propose a new mean $x^\star$.The paper shows EvoLLM outperforms baselines on BBOB and neuroevolution tasks, with ablations clarifying the importance of discretization and prompt design, and demonstrates gains from instruction fine-tuning on teacher trajectories.This suggests that text-trained LLMs can serve as plug-in, in-context recombination operators for derivative-free optimization, with scalable block-wise querying and potential for distilling teacher strategies.

Abstract

Large Transformer models are capable of implementing a plethora of so-called in-context learning algorithms. These include gradient descent, classification, sequence completion, transformation, and improvement. In this work, we investigate whether large language models (LLMs), which never explicitly encountered the task of black-box optimization, are in principle capable of implementing evolutionary optimization algorithms. While previous works have solely focused on language-based task specification, we move forward and focus on the zero-shot application of LLMs to black-box optimization. We introduce a novel prompting strategy, consisting of least-to-most sorting of discretized population members and querying the LLM to propose an improvement to the mean statistic, i.e. perform a type of black-box recombination operation. Empirically, we find that our setup allows the user to obtain an LLM-based evolution strategy, which we call `EvoLLM', that robustly outperforms baseline algorithms such as random search and Gaussian Hill Climbing on synthetic BBOB functions as well as small neuroevolution tasks. Hence, LLMs can act as `plug-in' in-context recombination operators. We provide several comparative studies of the LLM's model size, prompt strategy, and context construction. Finally, we show that one can flexibly improve EvoLLM's performance by providing teacher algorithm information via instruction fine-tuning on previously collected teacher optimization trajectories.
Paper Structure (19 sections, 2 equations, 13 figures, 1 table)

This paper contains 19 sections, 2 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: EvoLLM prompt design space & API. We track all solution evaluations and their performance in a context buffer. The buffer is used to construct query prompts for the LLM. After parsing the LLM output and sampling, we evaluate the resulting population and add the new information to the buffer. We provide an example of the generated prompts in the appendix.
  • Figure 2: Dimension-batched Querying of an LLM. As the search space dimensionality grows, the context length can exceed the feasibilities of the LLM. We split the solution space into blocks and perform multiple LLM queries per update.
  • Figure 3: EvoLLM performance (lower is better) on BBOB hansen2010real functions with single LLM query. We compare different LLM base models (marked in the lower box in the legend) and find that the behavior of EvoLLM is robust to the exact choice of LLM. The results are averaged across 10 independent runs.
  • Figure 4: EvoLLM performance (lower is better) on BBOB hansen2010real functions with multi-dimensional LLM query splits. We consider text-davinci-003 and PaLM2-XS as base LLM models and find that performance does not degrade when using splits. Top: 10-dimensional Sphere problem. Bottom: 10-dimensional Rosenbrock problem. Averaged results over 5 independent runs.
  • Figure 5: EvoLLM performance (higher is better) on CartPole & Acrobot brockman2016openaigymnax2022github control task with different neural network architectures. LLM-based ES can optimize small networks and even outperform baselines in the small evaluation budget regime. Averaged results over 5 independent runs.
  • ...and 8 more figures