Table of Contents
Fetching ...

TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

Danijar Hafner, James Davidson, Vincent Vanhoucke

TL;DR

The paper tackles slow reinforcement learning training caused by environment interaction and neural network computation by introducing TensorFlow Agents, a unified, in-graph infrastructure for parallel RL. It batches observations and runs multiple Gym environments in separate processes while keeping computations within the TensorFlow graph to minimize overhead. BatchPPO is presented as an efficient, in-graph PPO variant deployed within this framework, achieving competitive results on MuJoCo locomotion tasks. By open-sourcing the framework, the authors aim to accelerate future RL research and enable scalable experimentation with vectorized environments and algorithms.

Abstract

We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.

TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

TL;DR

The paper tackles slow reinforcement learning training caused by environment interaction and neural network computation by introducing TensorFlow Agents, a unified, in-graph infrastructure for parallel RL. It batches observations and runs multiple Gym environments in separate processes while keeping computations within the TensorFlow graph to minimize overhead. BatchPPO is presented as an efficient, in-graph PPO variant deployed within this framework, achieving competitive results on MuJoCo locomotion tasks. By open-sourcing the framework, the authors aim to accelerate future RL research and enable scalable experimentation with vectorized environments and algorithms.

Abstract

We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.

Paper Structure

This paper contains 11 sections, 4 equations, 2 figures.

Figures (2)

  • Figure 1: BatchPPO episode returns by environment steps. The blue line indicates performance when using the mean action, while the red line indicated performance when sampling from the action distribution. Our results are on par or better than published results using the PPO algorithm schulman2017ppo.
  • Figure 2: BatchPPO episode returns by training time in hours using 6 CPU cores. The blue line indicates performance when using the mean action, while the red line indicated performance when sampling from the action distribution. Our implementation can quickly solve challenging locomotion tasks on a single machine.