Table of Contents
Fetching ...

BehAVE: Behaviour Alignment of Video Game Encodings

Nemanja Rašajski, Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis

TL;DR

This paper introduces BehAVE, a video understanding framework that utilises existing commercial video games for domain randomisation without accessing their simulation engines, and evaluates BehAVE across 25 first-person shooter games, demonstrating its robustness in domain randomisation.

Abstract

Domain randomisation enhances the transferability of vision models across visually distinct domains with similar content. However, current methods heavily depend on intricate simulation engines, hampering feasibility and scalability. This paper introduces BehAVE, a video understanding framework that utilises existing commercial video games for domain randomisation without accessing their simulation engines. BehAVE taps into the visual diversity of video games for randomisation and uses textual descriptions of player actions to align videos with similar content. We evaluate BehAVE across 25 first-person shooter (FPS) games using various video and text foundation models, demonstrating its robustness in domain randomisation. BehAVE effectively aligns player behavioural patterns and achieves zero-shot transfer to multiple unseen FPS games when trained on just one game. In a more challenging scenario, BehAVE enhances the zero-shot transferability of foundation models to unseen FPS games, even when trained on a game of a different genre, with improvements of up to 22%. BehAVE is available online at https://github.com/nrasajski/BehAVE.

BehAVE: Behaviour Alignment of Video Game Encodings

TL;DR

This paper introduces BehAVE, a video understanding framework that utilises existing commercial video games for domain randomisation without accessing their simulation engines, and evaluates BehAVE across 25 first-person shooter games, demonstrating its robustness in domain randomisation.

Abstract

Domain randomisation enhances the transferability of vision models across visually distinct domains with similar content. However, current methods heavily depend on intricate simulation engines, hampering feasibility and scalability. This paper introduces BehAVE, a video understanding framework that utilises existing commercial video games for domain randomisation without accessing their simulation engines. BehAVE taps into the visual diversity of video games for randomisation and uses textual descriptions of player actions to align videos with similar content. We evaluate BehAVE across 25 first-person shooter (FPS) games using various video and text foundation models, demonstrating its robustness in domain randomisation. BehAVE effectively aligns player behavioural patterns and achieves zero-shot transfer to multiple unseen FPS games when trained on just one game. In a more challenging scenario, BehAVE enhances the zero-shot transferability of foundation models to unseen FPS games, even when trained on a game of a different genre, with improvements of up to 22%. BehAVE is available online at https://github.com/nrasajski/BehAVE.
Paper Structure (15 sections, 1 equation, 5 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 1 equation, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: High level overview of the BehAVE framework. The t-SNE plots show encodings of short video sequences from 5 distinct FPS games: (a) indicates the domain gap between encodings of different games from a video foundation model, while (b) shows encodings aligned by BehAVE. The framework positions similar player behaviour encodings (e.g., aim gun) closely across visually diverse games like PUBG (left) and Apex Legends (right).
  • Figure 2: Overview of experiments and datasets used: (a) Behaviour Alignment: BehAVE is trained on synchronised gameplay video and player actions from the SMG-25 train dataset, and evaluated on unseen games from the SMG-25 test dataset. (b) Behavior Classification: We test the transferability of a video classification task. BehAVE is trained independently on CS:GO and Minecraft, and transferred to the SMG-25 test dataset.
  • Figure 3: Screenshots from all games of the SMG-25 dataset: 1) PUBG: Battlegrounds (PUBG Studios, 2017); 2) Payday 3 (Starbreeze Studios, 2023); 3) Insurgency: Sandstorm (New World Interactive, 2021); 4) Call of Duty: MW2 (Infinity Ward, 2022); 5) Far Cry 5 (Ubisoft, 2018); 6) Bioshock Infinite (Irrational Games, 2013); 7) Grand Theft Auto 5 (Rockstar, 2013); 8) Rainbow Six: Siege (Ubisoft, 2015); 9) Team Fortress 2 (Valve, 2007); 10) Wolfenstein (Machine Games, 2014); 11) Apex Legends (Respawn Entertainment, 2019); 12) Atomic Heart (Mundfish, 2023); 13) Warhammer: Vermintide 2 (Fatshark, 2018); 14) Back 4 Blood (Turtle Rock Studios, 2021); 15) Halo 4 (343 Industries, 2012); 16) Crysis 2 (Crytek, 2011); 17) Overwatch 2 (Blizzard Entertainment, 2022); 18) Deathloop (Arkane Lyon, 2021); 19) Valorant (Riot Games, 2020); 20) Generation Zero (Systemic Reaction, 2019); 21) Polygon (Readaster Studio, 2020); 22) Titanfall 2 (Respawn Entertainment, 2016); 23) Destiny 2 (Bungie, 2017); 24) Shatterline (Frag Lab, 2022); 25) Operation Harsh Doorstep (Drakeling Labs, 2023).
  • Figure 4: Behaviour alignment experiments: (a) t-SNE embeddings and corresponding silhouette scores of actions encoded as binary labels (left) compared to pretrained text encoders (right). (b) Effect of varying the number of games in alignment training on behaviour category clustering across 10 test games.
  • Figure 5: Behaviour classification accuracy across 3 behaviour categories when transferring from (a) CS:GO (FPS game) and (b) Minecraft (non-FPS game) to unseen FPS games. Although BehAVE (aligned) encodings perform slightly worse on source domain test sets than foundation (unaligned) encodings, they show significant improvements in generalisation to target domains, highlighting BehAVE's enhanced transfer capacities.