player2vec: A Language Modeling Approach to Understand Player Behavior in Games
Tianze Wang, Maryam Honari-Jahromi, Styliani Katsarou, Olga Mikheeva, Theodoros Panagiotakopoulos, Sahar Asadi, Oleg Smirnov
TL;DR
This work addresses the shortage of self-supervised user representations in gaming by extending a long-range Transformer (Longformer) to model sessionized player behavior as token sequences. It preprocesses mobile-game event logs into textual sequences, enabling masked language modeling (MLM) pretraining on unlabeled data to produce player embeddings up to $4{,}096$ tokens in length. On a dataset of $125{,}000$ sessions from $10{,}000$ players over $15$ days, three model sizes—small, medium, and large—achieve progressively better MLM metrics, with the largest model reaching $\text{Accuracy}=0.958$, $\text{Perplexity}=1.161$, and $\text{Cross-Entropy}=0.149$. Embedding-space analysis using $t$-SNE and an $8$-component Gaussian Mixture Model reveals semantically meaningful player segments and previously unknown subpopulations, supporting downstream applications in personalization and product insights. The work highlights a scalable, self-supervised path to understanding player behavior and suggests future directions in fine-tuning, multitask learning, and noise robustness for real-world deployment.
Abstract
Methods for learning latent user representations from historical behavior logs have gained traction for recommendation tasks in e-commerce, content streaming, and other settings. However, this area still remains relatively underexplored in video and mobile gaming contexts. In this work, we present a novel method for overcoming this limitation by extending a long-range Transformer model from the natural language processing domain to player behavior data. We discuss specifics of behavior tracking in games and propose preprocessing and tokenization approaches by viewing in-game events in an analogous way to words in sentences, thus enabling learning player representations in a self-supervised manner in the absence of ground-truth annotations. We experimentally demonstrate the efficacy of the proposed approach in fitting the distribution of behavior events by evaluating intrinsic language modeling metrics. Furthermore, we qualitatively analyze the emerging structure of the learned embedding space and show its value for generating insights into behavior patterns to inform downstream applications.
