Table of Contents
Fetching ...

Behavioural Cloning in VizDoom

Ryan Spick, Timothy Bradley, Ayush Raina, Pierluigi Vito Amadori, Guy Moss

TL;DR

This work examines behavioral cloning for Doom 2 by training agents on pixel data via imitation learning and contrasting it with reinforcement learning to assess humanness. It introduces a CNN/ConvLSTM architecture that ingests RGB, depth, and segmentation channels, uses frame-skipping to extend temporal context, and employs a signed-MSE loss for mouse movement alongside BCE for discrete actions. Experiments show IL agents can match average human performance and exhibit diverse, human-like behaviors when trained on different players, though RL can surpass IL on raw kill-based metrics. The study highlights practical methods to inject depth and human-like traits into game agents using end-to-end visual input, without engine data, and suggests directions like depth estimation to broaden applicability.

Abstract

This paper describes methods for training autonomous agents to play the game "Doom 2" through Imitation Learning (IL) using only pixel data as input. We also explore how Reinforcement Learning (RL) compares to IL for humanness by comparing camera movement and trajectory data. Through behavioural cloning, we examine the ability of individual models to learn varying behavioural traits. We attempt to mimic the behaviour of real players with different play styles, and find we can train agents that behave aggressively, passively, or simply more human-like than traditional AIs. We propose these methods of introducing more depth and human-like behaviour to agents in video games. The trained IL agents perform on par with the average players in our dataset, whilst outperforming the worst players. While performance was not as strong as common RL approaches, it provides much stronger human-like behavioural traits to the agent.

Behavioural Cloning in VizDoom

TL;DR

This work examines behavioral cloning for Doom 2 by training agents on pixel data via imitation learning and contrasting it with reinforcement learning to assess humanness. It introduces a CNN/ConvLSTM architecture that ingests RGB, depth, and segmentation channels, uses frame-skipping to extend temporal context, and employs a signed-MSE loss for mouse movement alongside BCE for discrete actions. Experiments show IL agents can match average human performance and exhibit diverse, human-like behaviors when trained on different players, though RL can surpass IL on raw kill-based metrics. The study highlights practical methods to inject depth and human-like traits into game agents using end-to-end visual input, without engine data, and suggests directions like depth estimation to broaden applicability.

Abstract

This paper describes methods for training autonomous agents to play the game "Doom 2" through Imitation Learning (IL) using only pixel data as input. We also explore how Reinforcement Learning (RL) compares to IL for humanness by comparing camera movement and trajectory data. Through behavioural cloning, we examine the ability of individual models to learn varying behavioural traits. We attempt to mimic the behaviour of real players with different play styles, and find we can train agents that behave aggressively, passively, or simply more human-like than traditional AIs. We propose these methods of introducing more depth and human-like behaviour to agents in video games. The trained IL agents perform on par with the average players in our dataset, whilst outperforming the worst players. While performance was not as strong as common RL approaches, it provides much stronger human-like behavioural traits to the agent.
Paper Structure (21 sections, 9 equations, 5 figures, 5 tables)

This paper contains 21 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Figure of multiple trajectories showing each individual player's movement in the deathmatch environment. Each player's username has been anonymised into a random name, these will be referenced when discussing individual player's metrics and agent training.
  • Figure 2: a) Agent trajectory heatmap, from a model trained on the top 3 agent's data. b) Agent heatmap from the bottom 3 players combined training
  • Figure 3: Single-player trained agent's trajectories. Trajectories are from a single game with 6 bot players to most closely match the data captured from the player deathmatches.
  • Figure 4: Spatial heatmap of the RL agent's behaviour
  • Figure 5: Histogram and Gaussian approximations comparing a) angular velocity b) angular acceleration, c) angular jerk, behaviours for camera movements