Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Mohammad Babaeizadeh; Iuri Frosio; Stephen Tyree; Jason Clemons; Jan Kautz

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz

TL;DR

GA3C addresses the computational bottlenecks of asynchronous deep RL by centralizing the neural network on a GPU and decoupling data generation from training through prediction and training queues. It introduces dynamic scheduling to adapt NP, NT, and NA, maximizing GPU throughput while maintaining learning stability. Empirical results show significant training-throughput gains and faster convergence on Atari-2600 tasks compared with CPU A3C, with performance scaling with neural network size. The work offers a detailed analysis of latency, queue dynamics, and the TPS/PPS trade-offs and provides open-source code to facilitate broader adoption and further research.

Abstract

We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C .

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

TL;DR

Abstract

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)