Table of Contents
Fetching ...

LLM-POET: Evolving Complex Environments using Large Language Models

Fuma Aki, Riku Ikeda, Takumi Saito, Ciaran Regan, Mizuki Oka

TL;DR

The paper tackles the challenge of continuously generating complex environments for open-ended learning. It introduces LLM-POET, which replaces CPPN-based environment generation with a fine-tuned Large Language Model trained on environment-caption pairs to produce Evolution Gym environments from natural language prompts, plus a few-shot prompting strategy for mutations. In experiments against Enhanced-POET, LLM-POET achieves a 34% higher co-evolution performance, attributed to greater environment diversity and complexity that enable learning more sophisticated agent behaviors. This approach demonstrates a promising direction for leveraging language models to drive open-ended AI, potentially improving scalability and generalization across domains that require evolving environments.

Abstract

Creating systems capable of generating virtually infinite variations of complex and novel behaviour without predetermined goals or limits is a major challenge in the field of AI. This challenge has been addressed through the development of several open-ended algorithms that can continuously generate new and diverse behaviours, such as the POET and Enhanced-POET algorithms for co-evolving environments and agent behaviour. One of the challenges with existing methods however, is that they struggle to continuously generate complex environments. In this work, we propose LLM-POET, a modification of the POET algorithm where the environment is both created and mutated using a Large Language Model (LLM). By fine-tuning a LLM with text representations of Evolution Gym environments and captions that describe the environment, we were able to generate complex and diverse environments using natural language. We found that not only could the LLM produce a diverse range of environments, but compared to the CPPNs used in Enhanced-POET for environment generation, the LLM allowed for a 34% increase in the performance gain of co-evolution. This increased performance suggests that the agents were able to learn a more diverse set of skills by training on more complex environments.

LLM-POET: Evolving Complex Environments using Large Language Models

TL;DR

The paper tackles the challenge of continuously generating complex environments for open-ended learning. It introduces LLM-POET, which replaces CPPN-based environment generation with a fine-tuned Large Language Model trained on environment-caption pairs to produce Evolution Gym environments from natural language prompts, plus a few-shot prompting strategy for mutations. In experiments against Enhanced-POET, LLM-POET achieves a 34% higher co-evolution performance, attributed to greater environment diversity and complexity that enable learning more sophisticated agent behaviors. This approach demonstrates a promising direction for leveraging language models to drive open-ended AI, potentially improving scalability and generalization across domains that require evolving environments.

Abstract

Creating systems capable of generating virtually infinite variations of complex and novel behaviour without predetermined goals or limits is a major challenge in the field of AI. This challenge has been addressed through the development of several open-ended algorithms that can continuously generate new and diverse behaviours, such as the POET and Enhanced-POET algorithms for co-evolving environments and agent behaviour. One of the challenges with existing methods however, is that they struggle to continuously generate complex environments. In this work, we propose LLM-POET, a modification of the POET algorithm where the environment is both created and mutated using a Large Language Model (LLM). By fine-tuning a LLM with text representations of Evolution Gym environments and captions that describe the environment, we were able to generate complex and diverse environments using natural language. We found that not only could the LLM produce a diverse range of environments, but compared to the CPPNs used in Enhanced-POET for environment generation, the LLM allowed for a 34% increase in the performance gain of co-evolution. This increased performance suggests that the agents were able to learn a more diverse set of skills by training on more complex environments.
Paper Structure (7 sections, 7 figures)

This paper contains 7 sections, 7 figures.

Figures (7)

  • Figure 1: An overview of LLM-POET.
  • Figure 2: Overview of LLM fine-tuning.
  • Figure 3: Environments generated using a variety of prompts with the fine-tuned LLM.
  • Figure 4: Mutating environments with the original or mutated prompts.
  • Figure 5: Calculating score differences. The difference between PPO only score (left) and the POET score (right) quantifies the ability of the environment generation model to create diverse environments, leading to improved agent learning.
  • ...and 2 more figures