Table of Contents
Fetching ...

Adapting a World Model for Trajectory Following in a 3D Game

Marko Tot, Shu Ishida, Abdelhak Lemkhenter, David Bignell, Pallavi Choudhury, Chris Lovett, Luis França, Matheus Ribeiro Furtado de Mendonça, Tarun Gupta, Darren Gehring, Sam Devlin, Sergio Valcarcel Macua, Raluca Georgescu

TL;DR

The paper addresses trajectory following in a stochastic 3D game by reframing imitation learning through an Inverse Dynamics Model built on world-model embeddings. It evaluates six IDM configurations derived from three encoders (ConvNeXt, DINOv2, WHAM) and two heads (GPT, MLP) across General, Specific, and Fine-tuned data regimes, and introduces four future conditioning strategies to reduce distribution drift. Key findings show ConvNeXt-GPT excels in the general setting, DINOv2 variants shine in low-data scenarios, and fine-tuning with ConvNeXt variants yields strong performance, while WHAM often underperforms due to training-evaluation mismatch. These results provide practical guidance on encoder-head selection and future conditioning for robust trajectory replication in complex, partially observable game environments, highlighting ongoing generalization challenges and avenues for future robustness work.

Abstract

Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and stochasticity necessitate robust approaches beyond simple action replay. In this study, we apply Inverse Dynamics Models (IDM) with different encoders and policy heads to trajectory following in a modern 3D video game -- Bleeding Edge. Additionally, we investigate several future alignment strategies that address the distribution shift caused by the aleatoric uncertainty and imperfections of the agent. We measure both the trajectory deviation distance and the first significant deviation point between the reference and the agent's trajectory and show that the optimal configuration depends on the chosen setting. Our results show that in a diverse data setting, a GPT-style policy head with an encoder trained from scratch performs the best, DINOv2 encoder with the GPT-style policy head gives the best results in the low data regime, and both GPT-style and MLP-style policy heads had comparable results when pre-trained on a diverse setting and fine-tuned for a specific behaviour setting.

Adapting a World Model for Trajectory Following in a 3D Game

TL;DR

The paper addresses trajectory following in a stochastic 3D game by reframing imitation learning through an Inverse Dynamics Model built on world-model embeddings. It evaluates six IDM configurations derived from three encoders (ConvNeXt, DINOv2, WHAM) and two heads (GPT, MLP) across General, Specific, and Fine-tuned data regimes, and introduces four future conditioning strategies to reduce distribution drift. Key findings show ConvNeXt-GPT excels in the general setting, DINOv2 variants shine in low-data scenarios, and fine-tuning with ConvNeXt variants yields strong performance, while WHAM often underperforms due to training-evaluation mismatch. These results provide practical guidance on encoder-head selection and future conditioning for robust trajectory replication in complex, partially observable game environments, highlighting ongoing generalization challenges and avenues for future robustness work.

Abstract

Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and stochasticity necessitate robust approaches beyond simple action replay. In this study, we apply Inverse Dynamics Models (IDM) with different encoders and policy heads to trajectory following in a modern 3D video game -- Bleeding Edge. Additionally, we investigate several future alignment strategies that address the distribution shift caused by the aleatoric uncertainty and imperfections of the agent. We measure both the trajectory deviation distance and the first significant deviation point between the reference and the agent's trajectory and show that the optimal configuration depends on the chosen setting. Our results show that in a diverse data setting, a GPT-style policy head with an encoder trained from scratch performs the best, DINOv2 encoder with the GPT-style policy head gives the best results in the low data regime, and both GPT-style and MLP-style policy heads had comparable results when pre-trained on a diverse setting and fine-tuned for a specific behaviour setting.

Paper Structure

This paper contains 36 sections, 6 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: A high-level overview of the IDM model. We encode two distinct trajectories, the current trajectory of the agent, and the future conditioning. The resulting encodings are then passed into an IDM head to select which action should be performed.
  • Figure 2: We evaluate three different encoders. A trained from scratch ConvNeXt encoder, a general pre-trained encoder DINOv2, and a game-specific pre-trained World and Human Action Model.
  • Figure 3: Example images of the Sky Garden and Dojo maps used for training and evaluation.
  • Figure 4: Sampled rollouts from Benchmark 1. The red line shows the reference trajectory, while the blue line shows the agent's path. $x$ and $y$ axes represent the coordinates of the agent.
  • Figure A.1: Xbox controller input. Labels $1$ and $3$ represent the continuous stick inputs - each stick has two axes it can move in, while other labels represent the discrete button inputs.
  • ...and 7 more figures