Table of Contents
Fetching ...

Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report

Markus Dablander

TL;DR

The paper surveys five high-potential AI research directions for digital gaming: large language models for game agent modelling, neural cellular automata for procedural content generation, deep surrogate modelling to accelerate expensive in-game simulations, self-supervised video game state representation learning, and generative models of interactive worlds from unlabelled videos. It frames these avenues as an exploratory, non-exhaustive repertoire with concrete prior work and plausible applications, aiming to spark rigorous, targeted follow-up research. By outlining methodological approaches, potential benefits for realism, content variety, and training efficiency, the authors argue that games can drive both practical advances and insights applicable to broader AI. They also candidly discuss significant challenges—computational demands, data privacy, interpretability, and integration into development pipelines—that must be addressed to realize these benefits at scale.

Abstract

Video games are a natural and synergistic application domain for artificial intelligence (AI) systems, offering both the potential to enhance player experience and immersion, as well as providing valuable benchmarks and virtual environments to advance AI technologies in general. This report presents a high-level overview of five promising research pathways for applying state-of-the-art AI methods, particularly deep learning, to digital gaming within the context of the current research landscape. The objective of this work is to outline a curated, non-exhaustive list of encouraging research directions at the intersection of AI and video games that may serve to inspire more rigorous and comprehensive research efforts in the future. We discuss (i) investigating large language models as core engines for game agent modelling, (ii) using neural cellular automata for procedural game content generation, (iii) accelerating computationally expensive in-game simulations via deep surrogate modelling, (iv) leveraging self-supervised learning to obtain useful video game state embeddings, and (v) training generative models of interactive worlds using unlabelled video data. We also briefly address current technical challenges associated with the integration of advanced deep learning systems into video game development, and indicate key areas where further progress is likely to be beneficial.

Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report

TL;DR

The paper surveys five high-potential AI research directions for digital gaming: large language models for game agent modelling, neural cellular automata for procedural content generation, deep surrogate modelling to accelerate expensive in-game simulations, self-supervised video game state representation learning, and generative models of interactive worlds from unlabelled videos. It frames these avenues as an exploratory, non-exhaustive repertoire with concrete prior work and plausible applications, aiming to spark rigorous, targeted follow-up research. By outlining methodological approaches, potential benefits for realism, content variety, and training efficiency, the authors argue that games can drive both practical advances and insights applicable to broader AI. They also candidly discuss significant challenges—computational demands, data privacy, interpretability, and integration into development pipelines—that must be addressed to realize these benefits at scale.

Abstract

Video games are a natural and synergistic application domain for artificial intelligence (AI) systems, offering both the potential to enhance player experience and immersion, as well as providing valuable benchmarks and virtual environments to advance AI technologies in general. This report presents a high-level overview of five promising research pathways for applying state-of-the-art AI methods, particularly deep learning, to digital gaming within the context of the current research landscape. The objective of this work is to outline a curated, non-exhaustive list of encouraging research directions at the intersection of AI and video games that may serve to inspire more rigorous and comprehensive research efforts in the future. We discuss (i) investigating large language models as core engines for game agent modelling, (ii) using neural cellular automata for procedural game content generation, (iii) accelerating computationally expensive in-game simulations via deep surrogate modelling, (iv) leveraging self-supervised learning to obtain useful video game state embeddings, and (v) training generative models of interactive worlds using unlabelled video data. We also briefly address current technical challenges associated with the integration of advanced deep learning systems into video game development, and indicate key areas where further progress is likely to be beneficial.

Paper Structure

This paper contains 8 sections, 9 equations, 6 figures.

Figures (6)

  • Figure 1: Simple, high-level overview of a conceivable LLM-based cognitive architecture for a video game agent, strongly influenced by the works of park2023generative and hu2024survey. The perception module translates game environment features (pixels, statistical features, vectorial embeddings, etc.) extracted from the game state into textual descriptions. The memory module stores past textual perceptions, as well as other memory items that are either predetermined (fixed character information, basic goals, etc.) or generated by the thinking module (novel knowledge, reflections, goals, procedural skills, etc.). The thinking module, based on a large language model (LLM), processes current textual perceptions and relevant textual memory items retrieved from the memory module, and outputs textual action plans and new memory items. The textual action plans are converted by the action module into low-level sequences of in-game behaviours that are executed to change the game state.
  • Figure 2: Conceptual diagram of how a neural cellular automaton (NCA), once trained, could iteratively generate the target image of a tree (image not generated by actual NCA, used for illustrative purposes only). An NCA is a cellular automaton whose local transition function is parametrised by a neural network. mordvintsev2020growing showed how an NCA can be trained with gradient-based methods to organically grow an arbitrary, predefined target pattern from a single initial cell. The NCA can also learn to automatically converge back to its intended target pattern when disturbed in a manner that resembles self-regeneration.
  • Figure 3: Illustration of the elementary idea behind deep surrogate modelling: A computationally expensive function $f$ is repeatedly evaluated to generate a training data set $\mathfrak{D}$, which is then used to train a deep network $\Phi_{\theta}$. After training, $\Phi_{\theta}$ acts as a computationally fast approximation of $f$.
  • Figure 4: Schematic overview of a prototypical joint-embedding predictive architecture (JEPA) lecun2022path for self-supervised learning. The variables $x$ and $y$ could, for instance, represent images of game pixels at times $t$ and $t + \delta$, the encoders $\Phi_{\theta}$ and $\Psi_{\gamma}$ could be convolutional neural networks that map the images to embeddings $v_x, v_y$, the latent variable $z$ could symbolise the action taken by the player at time $t$, and $P_{\eta}$ could be a multilayer perceptron whose output $P_{\eta}(v_x, z)$ aims to approximate $v_y$.
  • Figure 5: Schematic diagram illustrating the inference process of the trained Genie model bruce2024genie to generate a playable platformer game from a given image prompt $x_1$ (images not generated by actual Genie system, used for illustrative purposes only). The video tokeniser and the latent action model respectively translate the prompt image $x_1$ and the initial player action input into embeddings $z_1$ and $a_1$, which are subsequently used by the dynamics model to predict the next tokenised frame $z_2$. The compressed representation $z_2$ is then converted by the video tokeniser into a visible game frame $x_2$. This process is iteratively repeated using previously generated image tokens and recorded input actions to give rise to a sequence of interactive game frames $(x_1, x_2, x_3, ...)$.
  • ...and 1 more figures