Generation of Probabilistic Synthetic Data for Serious Games: A Case Study on Cyberbullying
Jaime Pérez, Mario Castro, Edmond Awad, Gregorio López
TL;DR
The paper addresses the need for synthetic data in serious games by proposing a modular simulator that generates probabilistic data for interactive narratives. It combines Bayesian Networks to inject external knowledge with an Item Response Theory–based decision model to simulate agent interactions, demonstrated on a cyberbullying game (RAYUELA). The authors show identifiability and robustness of the generated data through hierarchical Bayesian inference in a BN-informed, two-cluster risk framework, using a 665-person survey to calibrate the model and 500 synthetic players across 15 questions. The approach offers a scalable way to anticipate data modelling, improve privacy and fairness, and accelerate development of serious games while enabling clustering of players by risk propensity. The work provides a concrete architecture and methodology that others can adapt to different serious-game domains and datasets, with potential for real-player validation in future work.
Abstract
Synthetic data generation has been a growing area of research in recent years. However, its potential applications in serious games have not been thoroughly explored. Advances in this field could anticipate data modelling and analysis, as well as speed up the development process. To try to fill this gap in the literature, we propose a simulator architecture for generating probabilistic synthetic data for serious games based on interactive narratives. This architecture is designed to be generic and modular so that it can be used by other researchers on similar problems. To simulate the interaction of synthetic players with questions, we use a cognitive testing model based on the Item Response Theory framework. We also show how probabilistic graphical models (in particular Bayesian networks) can be used to introduce expert knowledge and external data into the simulation. Finally, we apply the proposed architecture and methods in a use case of a serious game focused on cyberbullying. We perform Bayesian inference experiments using a hierarchical model to demonstrate the identifiability and robustness of the generated data.
