Table of Contents
Fetching ...

player2vec: A Language Modeling Approach to Understand Player Behavior in Games

Tianze Wang, Maryam Honari-Jahromi, Styliani Katsarou, Olga Mikheeva, Theodoros Panagiotakopoulos, Sahar Asadi, Oleg Smirnov

TL;DR

This work addresses the shortage of self-supervised user representations in gaming by extending a long-range Transformer (Longformer) to model sessionized player behavior as token sequences. It preprocesses mobile-game event logs into textual sequences, enabling masked language modeling (MLM) pretraining on unlabeled data to produce player embeddings up to $4{,}096$ tokens in length. On a dataset of $125{,}000$ sessions from $10{,}000$ players over $15$ days, three model sizes—small, medium, and large—achieve progressively better MLM metrics, with the largest model reaching $\text{Accuracy}=0.958$, $\text{Perplexity}=1.161$, and $\text{Cross-Entropy}=0.149$. Embedding-space analysis using $t$-SNE and an $8$-component Gaussian Mixture Model reveals semantically meaningful player segments and previously unknown subpopulations, supporting downstream applications in personalization and product insights. The work highlights a scalable, self-supervised path to understanding player behavior and suggests future directions in fine-tuning, multitask learning, and noise robustness for real-world deployment.

Abstract

Methods for learning latent user representations from historical behavior logs have gained traction for recommendation tasks in e-commerce, content streaming, and other settings. However, this area still remains relatively underexplored in video and mobile gaming contexts. In this work, we present a novel method for overcoming this limitation by extending a long-range Transformer model from the natural language processing domain to player behavior data. We discuss specifics of behavior tracking in games and propose preprocessing and tokenization approaches by viewing in-game events in an analogous way to words in sentences, thus enabling learning player representations in a self-supervised manner in the absence of ground-truth annotations. We experimentally demonstrate the efficacy of the proposed approach in fitting the distribution of behavior events by evaluating intrinsic language modeling metrics. Furthermore, we qualitatively analyze the emerging structure of the learned embedding space and show its value for generating insights into behavior patterns to inform downstream applications.

player2vec: A Language Modeling Approach to Understand Player Behavior in Games

TL;DR

This work addresses the shortage of self-supervised user representations in gaming by extending a long-range Transformer (Longformer) to model sessionized player behavior as token sequences. It preprocesses mobile-game event logs into textual sequences, enabling masked language modeling (MLM) pretraining on unlabeled data to produce player embeddings up to tokens in length. On a dataset of sessions from players over days, three model sizes—small, medium, and large—achieve progressively better MLM metrics, with the largest model reaching , , and . Embedding-space analysis using -SNE and an -component Gaussian Mixture Model reveals semantically meaningful player segments and previously unknown subpopulations, supporting downstream applications in personalization and product insights. The work highlights a scalable, self-supervised path to understanding player behavior and suggests future directions in fine-tuning, multitask learning, and noise robustness for real-world deployment.

Abstract

Methods for learning latent user representations from historical behavior logs have gained traction for recommendation tasks in e-commerce, content streaming, and other settings. However, this area still remains relatively underexplored in video and mobile gaming contexts. In this work, we present a novel method for overcoming this limitation by extending a long-range Transformer model from the natural language processing domain to player behavior data. We discuss specifics of behavior tracking in games and propose preprocessing and tokenization approaches by viewing in-game events in an analogous way to words in sentences, thus enabling learning player representations in a self-supervised manner in the absence of ground-truth annotations. We experimentally demonstrate the efficacy of the proposed approach in fitting the distribution of behavior events by evaluating intrinsic language modeling metrics. Furthermore, we qualitatively analyze the emerging structure of the learned embedding space and show its value for generating insights into behavior patterns to inform downstream applications.
Paper Structure (11 sections, 4 figures, 2 tables)

This paper contains 11 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An illustration of the categorization of events into sessions. The final event in session 1, marked as a game-end event, is expanded to show details about its associated fields and values.
  • Figure 2: (a) Histogram of session lengths in the dataset. (b) Distribution of player activity over a 15-day period. (c) Event distribution, where events belonging to similar semantic classes are grouped together. Plots in (a) and (b) show data up to the 99th percentile.
  • Figure 3: Data preprocessing pipeline. Raw event logs are passed through filtering, type conversion, grouping, and joining stages to produce textual data.
  • Figure 4: (a) t-SNE of latent embedding space obtained from pre-trained player2vec-large with subsequent GMM clustering. (b) Histogram of the quantized player events in identified clusters. We exclude cluster 8 due to the small cluster size and no gameplay.