Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data

Jeppe Theiss Kristensen; Paolo Burelli

Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data

Jeppe Theiss Kristensen, Paolo Burelli

TL;DR

The paper tackles the challenge of estimating perceived puzzle difficulty in mobile games, especially for new content, by proposing a two-step framework that combines playtest-agent data with historic, player, and level data. It evaluates three prediction methods—Random Forest, Artificial Neural Networks, and Factorisation Machines—using Lily's Garden as a case study, focusing on both personalised and cohort-level predictions. The findings show that Factorisation Machines perform best for existing content with player and level data, while Neural Networks excel in cold-start scenarios; adding agent data generally improves predictions but can cause overfitting for some models. The work provides practical guidance for live-service game studios on data collection and model selection to support adaptive difficulty and content design, with implications for both immediate feedback to designers and automated content generation.

Abstract

Difficulty is one of the key drivers of player engagement and it is often one of the aspects that designers tweak most to optimise the player experience; operationalising it is, therefore, a crucial task for game development studios. A common practice consists of creating metrics out of data collected by player interactions with the content; however, this allows for estimation only after the content is released and does not consider the characteristics of potential future players. In this article, we present a number of potential solutions for the estimation of difficulty under such conditions, and we showcase the results of a comparative study intended to understand which method and which types of data perform better in different scenarios. The results reveal that models trained on a combination of cohort statistics and simulated data produce the most accurate estimations of difficulty in all scenarios. Furthermore, among these models, artificial neural networks show the most consistent results.

Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data

TL;DR

Abstract

Paper Structure (21 sections, 1 equation, 9 figures, 1 table)

This paper contains 21 sections, 1 equation, 9 figures, 1 table.

Introduction
Related work
Predicting difficulty
Playtesting agents
Case study: Lily's Garden
Methods
Playtest agent
Prediction methods
Random forest
Artificial neural networks
factorisation machines
Data
Lily's Garden case study data
Experiment A: personalised predictions
Personalised predictions with historic data available
...and 6 more sections

Figures (9)

Figure 1: Example puzzle level in Lily's Garden
Figure 2: The average attempts per complete over the first 500 levels investigated. A rolling mean with window size 12 is also shown to visualise the trend. The vertical grey bars indicate tutorial levels. Adapted from Kristensen et al. (2022) kristensen2022uist.
Figure 3: Two examples of attempt distribution on an easy and a hard level (level 5 and 383, respectively). Adapted from Kristensen et al. (2022) kristensen2022uist.
Figure 4: Train/test split of the data for personalised predictions on levels with historic data and on levels in the cold start scenario. In this study we use $n_\textup{obs}=100$. The greyed out area is ignored observations for ensuring the experiments are evaluated on the same content. Adapted from kristensen2022uist.
Figure 5: RMSE of personalised predictions on content that has been played by other players.
...and 4 more figures

Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data

TL;DR

Abstract

Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)