TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

Danijar Hafner; James Davidson; Vincent Vanhoucke

TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

Danijar Hafner, James Davidson, Vincent Vanhoucke

TL;DR

The paper tackles slow reinforcement learning training caused by environment interaction and neural network computation by introducing TensorFlow Agents, a unified, in-graph infrastructure for parallel RL. It batches observations and runs multiple Gym environments in separate processes while keeping computations within the TensorFlow graph to minimize overhead. BatchPPO is presented as an efficient, in-graph PPO variant deployed within this framework, achieving competitive results on MuJoCo locomotion tasks. By open-sourcing the framework, the authors aim to accelerate future RL research and enable scalable experimentation with vectorized environments and algorithms.

Abstract

We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.

TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

TL;DR

Abstract

TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)