Table of Contents
Fetching ...

Large Language Models as In-context AI Generators for Quality-Diversity

Bryan Lim, Manon Flageat, Antoine Cully

TL;DR

This work tackles open-ended search in Quality-Diversity (QD) by introducing In-context QD, which uses Large Language Models (LLMs) to generate diverse, high-quality solutions conditioned on an archive of elites. The method encodes solutions into a prompt, builds a context from a subset of the archive, and employs a query strategy to steer generation, testing on BBOB benchmarks, a redundant robotic arm, and a 24-parameter hexapod policy task across archive sizes $C imes 400$ or $1600$ and parameter-space dimensions $D imes 5,10,24$. Results show competitive or superior performance to MAP-Elites and Random baselines, with ablations confirming the importance of prompt design, context structure, and context size. The findings illustrate the potential of pattern-matching LLMs as AI generators for QD, offering a scalable, domain-general approach while highlighting limitations related to feature-space openness and context-length.

Abstract

Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using few-shot and many-shot prompting with quality-diverse examples from the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.

Large Language Models as In-context AI Generators for Quality-Diversity

TL;DR

This work tackles open-ended search in Quality-Diversity (QD) by introducing In-context QD, which uses Large Language Models (LLMs) to generate diverse, high-quality solutions conditioned on an archive of elites. The method encodes solutions into a prompt, builds a context from a subset of the archive, and employs a query strategy to steer generation, testing on BBOB benchmarks, a redundant robotic arm, and a 24-parameter hexapod policy task across archive sizes or and parameter-space dimensions . Results show competitive or superior performance to MAP-Elites and Random baselines, with ablations confirming the importance of prompt design, context structure, and context size. The findings illustrate the potential of pattern-matching LLMs as AI generators for QD, offering a scalable, domain-general approach while highlighting limitations related to feature-space openness and context-length.

Abstract

Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using few-shot and many-shot prompting with quality-diverse examples from the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.
Paper Structure (14 sections, 1 equation, 7 figures, 2 tables)

This paper contains 14 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of In-context QD using LLMs as idea and solution generators. The LLM observes the current state of the archive which contains high-quality and diverse solution examples and uses this to generate new solutions to be evaluated and considered for addition to the archive. The prompt consists of context and query portions which are concatenated together.
  • Figure 2: Performance comparison over QD-Score, Coverage and Max-Fitness across all tasks considered where $D$ is the parameter space dimensions and $C$ is the number of niches in the archive. The results are averaged over 5 independent runs.
  • Figure 3: Evolution of archives across generations showing the different optimization paths and dynamics of In-context QD which can quickly exploit patterns for regions of high-performance.
  • Figure 4: QD-Score of In-context QD across a variety of parameter space dimensionality $D$ and archive sizes $C$.
  • Figure 5: Performance comparison between different prompt templates (see Table \ref{['tab:prompt-template']}).
  • ...and 2 more figures