Table of Contents
Fetching ...

ChatGPT4PCG 2 Competition: Prompt Engineering for Science Birds Level Generation

Pittawat Taveekitworachai, Febri Abdullah, Mury F. Dewantoro, Yi Xia, Pratch Suntichaikul, Ruck Thawonmas, Julian Togelius, Jochen Renz

TL;DR

This paper advances prompt engineering for PCG by presenting ChatGPT4PCG 2, a CoG 2024 competition edition that introduces a diversity metric to combat repetitive level generation and enables Python-based PE submissions for more complex control-flow prompts. It upgrades the evaluation pipeline with a new Vision Transformer classifier trained on a font-based dataset and a manually crafted evaluation set, and formalizes eight evaluation stages including a new diversity check. The study analyzes the impact of function signatures on ChatGPT performance, demonstrates the effectiveness of a format-guiding prompt, and evaluates a spectrum of PE techniques from zero-shot to ToT prompting, highlighting that multi-turn prompts with explicit formatting yield strong results. Overall, the work provides a resource-rich platform for learning and advancing PE for Science Birds level generation, with open-source PE implementations to foster ongoing research and education in PE and PCG.

Abstract

This paper presents the second ChatGPT4PCG competition at the 2024 IEEE Conference on Games. In this edition of the competition, we follow the first edition, but make several improvements and changes. We introduce a new evaluation metric along with allowing a more flexible format for participants' submissions and making several improvements to the evaluation pipeline. Continuing from the first edition, we aim to foster and explore the realm of prompt engineering (PE) for procedural content generation (PCG). While the first competition saw success, it was hindered by various limitations; we aim to mitigate these limitations in this edition. We introduce diversity as a new metric to discourage submissions aimed at producing repetitive structures. Furthermore, we allow submission of a Python program instead of a prompt text file for greater flexibility in implementing advanced PE approaches, which may require control flow, including conditions and iterations. We also make several improvements to the evaluation pipeline with a better classifier for similarity evaluation and better-performing function signatures. We thoroughly evaluate the effectiveness of the new metric and the improved classifier. Additionally, we perform an ablation study to select a function signature to instruct ChatGPT for level generation. Finally, we provide implementation examples of various PE techniques in Python and evaluate their preliminary performance. We hope this competition serves as a resource and platform for learning about PE and PCG in general.

ChatGPT4PCG 2 Competition: Prompt Engineering for Science Birds Level Generation

TL;DR

This paper advances prompt engineering for PCG by presenting ChatGPT4PCG 2, a CoG 2024 competition edition that introduces a diversity metric to combat repetitive level generation and enables Python-based PE submissions for more complex control-flow prompts. It upgrades the evaluation pipeline with a new Vision Transformer classifier trained on a font-based dataset and a manually crafted evaluation set, and formalizes eight evaluation stages including a new diversity check. The study analyzes the impact of function signatures on ChatGPT performance, demonstrates the effectiveness of a format-guiding prompt, and evaluates a spectrum of PE techniques from zero-shot to ToT prompting, highlighting that multi-turn prompts with explicit formatting yield strong results. Overall, the work provides a resource-rich platform for learning and advancing PE for Science Birds level generation, with open-source PE implementations to foster ongoing research and education in PE and PCG.

Abstract

This paper presents the second ChatGPT4PCG competition at the 2024 IEEE Conference on Games. In this edition of the competition, we follow the first edition, but make several improvements and changes. We introduce a new evaluation metric along with allowing a more flexible format for participants' submissions and making several improvements to the evaluation pipeline. Continuing from the first edition, we aim to foster and explore the realm of prompt engineering (PE) for procedural content generation (PCG). While the first competition saw success, it was hindered by various limitations; we aim to mitigate these limitations in this edition. We introduce diversity as a new metric to discourage submissions aimed at producing repetitive structures. Furthermore, we allow submission of a Python program instead of a prompt text file for greater flexibility in implementing advanced PE approaches, which may require control flow, including conditions and iterations. We also make several improvements to the evaluation pipeline with a better classifier for similarity evaluation and better-performing function signatures. We thoroughly evaluate the effectiveness of the new metric and the improved classifier. Additionally, we perform an ablation study to select a function signature to instruct ChatGPT for level generation. Finally, we provide implementation examples of various PE techniques in Python and evaluate their preliminary performance. We hope this competition serves as a resource and platform for learning about PE and PCG in general.
Paper Structure (15 sections, 3 tables)