Adapting a World Model for Trajectory Following in a 3D Game
Marko Tot, Shu Ishida, Abdelhak Lemkhenter, David Bignell, Pallavi Choudhury, Chris Lovett, Luis França, Matheus Ribeiro Furtado de Mendonça, Tarun Gupta, Darren Gehring, Sam Devlin, Sergio Valcarcel Macua, Raluca Georgescu
TL;DR
The paper addresses trajectory following in a stochastic 3D game by reframing imitation learning through an Inverse Dynamics Model built on world-model embeddings. It evaluates six IDM configurations derived from three encoders (ConvNeXt, DINOv2, WHAM) and two heads (GPT, MLP) across General, Specific, and Fine-tuned data regimes, and introduces four future conditioning strategies to reduce distribution drift. Key findings show ConvNeXt-GPT excels in the general setting, DINOv2 variants shine in low-data scenarios, and fine-tuning with ConvNeXt variants yields strong performance, while WHAM often underperforms due to training-evaluation mismatch. These results provide practical guidance on encoder-head selection and future conditioning for robust trajectory replication in complex, partially observable game environments, highlighting ongoing generalization challenges and avenues for future robustness work.
Abstract
Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and stochasticity necessitate robust approaches beyond simple action replay. In this study, we apply Inverse Dynamics Models (IDM) with different encoders and policy heads to trajectory following in a modern 3D video game -- Bleeding Edge. Additionally, we investigate several future alignment strategies that address the distribution shift caused by the aleatoric uncertainty and imperfections of the agent. We measure both the trajectory deviation distance and the first significant deviation point between the reference and the agent's trajectory and show that the optimal configuration depends on the chosen setting. Our results show that in a diverse data setting, a GPT-style policy head with an encoder trained from scratch performs the best, DINOv2 encoder with the GPT-style policy head gives the best results in the low data regime, and both GPT-style and MLP-style policy heads had comparable results when pre-trained on a diverse setting and fine-tuned for a specific behaviour setting.
