Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem

Davor Hafnar; Jure Demšar

Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem

Davor Hafnar, Jure Demšar

TL;DR

This work tackles the cold-start and data-cost barriers of personalized procedural content generation by employing zero-shot reasoning with GPT-4 to generate personalized mobile game levels from real-time gameplay data. An end-to-end production-oriented pipeline combines a Unity Match-3 game, Google Cloud data collection, and a backend PCG module that serves three JSON-formatted level parameter sets per completed level. Bayesian analyses reveal that LLM-based personalized PCG improves overall level completion rates over traditional PCG, while ratings depend on how dropouts are accounted for, highlighting both engagement gains and nuanced user satisfaction. The study demonstrates the practicality and scalability of using zero-shot LLMs for production PCG and points to broad future opportunities for expanding personalization across game genres and mechanics.

Abstract

Procedural content generation uses algorithmic techniques to create large amounts of new content for games at much lower production costs. In newer approaches, procedural content generation utilizes machine learning. However, these methods usually require expensive collection of large amounts of data, as well as the development and training of fairly complex learning models, which can be both extremely time-consuming and expensive. The core of our research is to explore whether we can lower the barrier to the use of personalized procedural content generation through a more practical and generalizable approach with large language models. Matching game content with player preferences benefits both players, who enjoy the game more, and developers, who increasingly depend on players enjoying the game before being able to monetize it. Therefore, this paper presents a novel approach to achieving personalization by using large language models to propose levels based on the gameplay data continuously collected from individual players. We compared the levels generated using our approach with levels generated with more traditional procedural generation techniques. Our easily reproducible method has proven viable in a production setting and outperformed levels generated by traditional methods in the probability that a player will not quit the game mid-level.

Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem

TL;DR

Abstract

Paper Structure (21 sections, 4 equations, 7 figures, 1 table)

This paper contains 21 sections, 4 equations, 7 figures, 1 table.

Introduction
Related work
Personalized PCG
Large Language Models
Zero-shot reasoning
Approach and basic components
The match 3 game
Data collection
Prompting the LLM
The Procedural content generation module
Personalized gameplay
Experiment Design
Serving initial levels
Participants
Statistical analysis
...and 6 more sections

Figures (7)

Figure 1: An illustration of game mechanics in our Match 3 puzzle game. (a) A board is filled with pieces of different colours. (b) The player's objective is to find and match three board elements of the same colour. He can do that by swapping neighbouring pieces. (c) Once three or more elements of the same colour form a line, the player is awarded a certain amount of points, while the matching elements are removed from the board and replaced by new elements.
Figure 2: Personalized PCG Pipeline Diagram. Once the player completes and rates a level, the data is sent to the backend server. There, the prompt is constructed and level parameter boundaries are described. The strict JSON is enforced by utilizing the function calling functionality of GPT-4 LLM API. The data is then sent to the LLM, and three levels are returned as JSON. The mobile game then generates the levels based on suggested parameters. Ideally, the player plays just the first of three levels and a new batch of levels is generated after he completes that one, but three are returned to account for possible latency as level parameters are not generated on the device.
Figure 3: Level completion probabilities comparing LLM-generated levels vs. traditional PCG. Players were more likely to complete levels generated using LLMs than those generated traditionally. The same holds both when looking at all levels and when looking just at the first level.
Figure 4: A visualization of the beta coefficient, comparing the ratings for LLM-based PCG against traditional PCG, excluding dropouts. Since the beta coefficient is very likely negative (P $=$ 0.99), we can claim with high confidence that levels generated with traditional PCG have a higher rating than those generated with LLM PCG. Note here that in this analysis we did not account for dropouts -- players who left the game before completing and rating the level.
Figure 5: A visualization of the beta coefficient, comparing the ratings for LLM-based PCG against traditional PCG, including dropouts. Since the beta coefficient is very likely positive (P $=$ 0.99), we can claim with high confidence that levels generated with LLM PCG have a higher rating than those generated with traditional PCG when accounting for dropouts. This suggests that a lot of players were very unhappy with traditional PCG and decided to quit the level mid-way.
...and 2 more figures

Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem

TL;DR

Abstract

Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem

Authors

TL;DR

Abstract

Table of Contents

Figures (7)