Behavioural Cloning in VizDoom
Ryan Spick, Timothy Bradley, Ayush Raina, Pierluigi Vito Amadori, Guy Moss
TL;DR
This work examines behavioral cloning for Doom 2 by training agents on pixel data via imitation learning and contrasting it with reinforcement learning to assess humanness. It introduces a CNN/ConvLSTM architecture that ingests RGB, depth, and segmentation channels, uses frame-skipping to extend temporal context, and employs a signed-MSE loss for mouse movement alongside BCE for discrete actions. Experiments show IL agents can match average human performance and exhibit diverse, human-like behaviors when trained on different players, though RL can surpass IL on raw kill-based metrics. The study highlights practical methods to inject depth and human-like traits into game agents using end-to-end visual input, without engine data, and suggests directions like depth estimation to broaden applicability.
Abstract
This paper describes methods for training autonomous agents to play the game "Doom 2" through Imitation Learning (IL) using only pixel data as input. We also explore how Reinforcement Learning (RL) compares to IL for humanness by comparing camera movement and trajectory data. Through behavioural cloning, we examine the ability of individual models to learn varying behavioural traits. We attempt to mimic the behaviour of real players with different play styles, and find we can train agents that behave aggressively, passively, or simply more human-like than traditional AIs. We propose these methods of introducing more depth and human-like behaviour to agents in video games. The trained IL agents perform on par with the average players in our dataset, whilst outperforming the worst players. While performance was not as strong as common RL approaches, it provides much stronger human-like behavioural traits to the agent.
