Table of Contents
Fetching ...

PCGRLLM: Large Language Model-Driven Reward Design for Procedural Content Generation Reinforcement Learning

In-Chang Baek, Sung-Hyun Kim, Sam Earle, Zehua Jiang, Noh Jin-Ha, Julian Togelius, Kyung-Joong Kim

TL;DR

The paper addresses the bottleneck of reward design in procedural content generation via reinforcement learning by introducing PCGRLLM, a framework that uses a feedback loop and reasoning-based prompt engineering to autonomously generate and refine reward functions. It extends prior work by incorporating self-alignment with the environment and content-informed feedback, enabling iterative improvement of rewards through ToT and GoT style reasoning. Empirical results in a 2D PCGRL setting show substantial gains in reward-generation accuracy, with improvements up to 415% for certain LLMs, and demonstrate the method’s ability to generalize across models from zero-shot to few-shot capabilities. The work highlights the potential to reduce human intervention in game AI development and to enhance creative processes in content generation, while also exploring content-evaluation challenges and vision-assisted feedback avenues for future improvement.

Abstract

Reward design plays a pivotal role in the training of game AIs, requiring substantial domain-specific knowledge and human effort. In recent years, several studies have explored reward generation for training game agents and controlling robots using large language models (LLMs). In the content generation literature, there has been early work on generating reward functions for reinforcement learning agent generators. This work introduces PCGRLLM, an extended architecture based on earlier work, which employs a feedback mechanism and several reasoning-based prompt engineering techniques. We evaluate the proposed method on a story-to-reward generation task in a two-dimensional environment using two state-of-the-art LLMs, demonstrating the generalizability of our approach. Our experiments provide insightful evaluations that demonstrate the capabilities of LLMs essential for content generation tasks. The results highlight significant performance improvements of 415% and 40% respectively, depending on the zero-shot capabilities of the language model. Our work demonstrates the potential to reduce human dependency in game AI development, while supporting and enhancing creative processes.

PCGRLLM: Large Language Model-Driven Reward Design for Procedural Content Generation Reinforcement Learning

TL;DR

The paper addresses the bottleneck of reward design in procedural content generation via reinforcement learning by introducing PCGRLLM, a framework that uses a feedback loop and reasoning-based prompt engineering to autonomously generate and refine reward functions. It extends prior work by incorporating self-alignment with the environment and content-informed feedback, enabling iterative improvement of rewards through ToT and GoT style reasoning. Empirical results in a 2D PCGRL setting show substantial gains in reward-generation accuracy, with improvements up to 415% for certain LLMs, and demonstrate the method’s ability to generalize across models from zero-shot to few-shot capabilities. The work highlights the potential to reduce human intervention in game AI development and to enhance creative processes in content generation, while also exploring content-evaluation challenges and vision-assisted feedback avenues for future improvement.

Abstract

Reward design plays a pivotal role in the training of game AIs, requiring substantial domain-specific knowledge and human effort. In recent years, several studies have explored reward generation for training game agents and controlling robots using large language models (LLMs). In the content generation literature, there has been early work on generating reward functions for reinforcement learning agent generators. This work introduces PCGRLLM, an extended architecture based on earlier work, which employs a feedback mechanism and several reasoning-based prompt engineering techniques. We evaluate the proposed method on a story-to-reward generation task in a two-dimensional environment using two state-of-the-art LLMs, demonstrating the generalizability of our approach. Our experiments provide insightful evaluations that demonstrate the capabilities of LLMs essential for content generation tasks. The results highlight significant performance improvements of 415% and 40% respectively, depending on the zero-shot capabilities of the language model. Our work demonstrates the potential to reduce human dependency in game AI development, while supporting and enhancing creative processes.

Paper Structure

This paper contains 29 sections, 1 equation, 9 figures, 6 tables, 2 algorithms.

Figures (9)

  • Figure 1: An overview of the reward generation process: (1) instructions guide the LLM, (2) outputs direct the agent, (3) environment interactions refine rewards, and (4) feedback analyzes content for improvement.
  • Figure 2: The architectural comparison of three prompt engineering techniques, along with details of the thought nodes. Each thought node includes a reward function ($R$) and a fitness value ($f$), which represent the evaluated score of the contents trained by the agent using the reward function. In the Tree- and Graph-of-Thought methods, the parent node is selected based on the fitness value.
  • Figure 3: Architecture of PCGRLLM framework. "Message icons " indicate the use of language model ($\mathcal{M}$) in the context. Refer to Section \ref{['sec:method']} for detailed description.
  • Figure 4: The generated level images are from the iterative reward generation process based on the given instructions. Each map corresponds to an iteration ($y$), which represents the number of times the reward has been generated and revised by LLMs, and is produced by an agent trained using these LLM-generated reward functions. The asterisk (*) denotes that the generated level satisfies the given instructions.
  • Figure 5: (a) The dotted lines represent the solutions that achievable to the key. (b) The player enemy encounters enemies within yellow dotted box.
  • ...and 4 more figures