Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU
Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz
TL;DR
GA3C addresses the computational bottlenecks of asynchronous deep RL by centralizing the neural network on a GPU and decoupling data generation from training through prediction and training queues. It introduces dynamic scheduling to adapt NP, NT, and NA, maximizing GPU throughput while maintaining learning stability. Empirical results show significant training-throughput gains and faster convergence on Atari-2600 tasks compared with CPU A3C, with performance scaling with neural network size. The work offers a detailed analysis of latency, queue dynamics, and the TPS/PPS trade-offs and provides open-source code to facilitate broader adoption and further research.
Abstract
We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C .
