Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data
Jeppe Theiss Kristensen, Paolo Burelli
TL;DR
The paper tackles the challenge of estimating perceived puzzle difficulty in mobile games, especially for new content, by proposing a two-step framework that combines playtest-agent data with historic, player, and level data. It evaluates three prediction methods—Random Forest, Artificial Neural Networks, and Factorisation Machines—using Lily's Garden as a case study, focusing on both personalised and cohort-level predictions. The findings show that Factorisation Machines perform best for existing content with player and level data, while Neural Networks excel in cold-start scenarios; adding agent data generally improves predictions but can cause overfitting for some models. The work provides practical guidance for live-service game studios on data collection and model selection to support adaptive difficulty and content design, with implications for both immediate feedback to designers and automated content generation.
Abstract
Difficulty is one of the key drivers of player engagement and it is often one of the aspects that designers tweak most to optimise the player experience; operationalising it is, therefore, a crucial task for game development studios. A common practice consists of creating metrics out of data collected by player interactions with the content; however, this allows for estimation only after the content is released and does not consider the characteristics of potential future players. In this article, we present a number of potential solutions for the estimation of difficulty under such conditions, and we showcase the results of a comparative study intended to understand which method and which types of data perform better in different scenarios. The results reveal that models trained on a combination of cohort statistics and simulated data produce the most accurate estimations of difficulty in all scenarios. Furthermore, among these models, artificial neural networks show the most consistent results.
