Table of Contents
Fetching ...

RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica

TL;DR

This work tackles scaling and composition of reinforcement learning (RL) algorithms that exhibit irregular, nested parallelism by proposing a logically centralized, hierarchical control model and building RLlib on top of Ray to encapsulate parallelism inside short-running tasks. It introduces core abstractions—Policy Graph, Policy Evaluator, and Policy Optimizer—that unify distributed sampling, evaluation, and updates across diverse algorithms, including Ape-X, PPO, A3C, DQN, ES, and model-based/multi-agent variants. Empirically, RLlib achieves state-of-the-art throughput and scalability from single-node to large clusters, with results such as high-throughput Ape-X performance and large-scale ES up to thousands of cores, while remaining competitive with specialized systems. The approach enables rapid development and reuse of RL components, delivering scalable abstractions that empower researchers to prototype complex architectures with minimal code changes.

Abstract

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.

RLlib: Abstractions for Distributed Reinforcement Learning

TL;DR

This work tackles scaling and composition of reinforcement learning (RL) algorithms that exhibit irregular, nested parallelism by proposing a logically centralized, hierarchical control model and building RLlib on top of Ray to encapsulate parallelism inside short-running tasks. It introduces core abstractions—Policy Graph, Policy Evaluator, and Policy Optimizer—that unify distributed sampling, evaluation, and updates across diverse algorithms, including Ape-X, PPO, A3C, DQN, ES, and model-based/multi-agent variants. Empirically, RLlib achieves state-of-the-art throughput and scalability from single-node to large clusters, with results such as high-throughput Ape-X performance and large-scale ES up to thousands of cores, while remaining competitive with specialized systems. The approach enables rapid development and reuse of RL components, delivering scalable abstractions that empower researchers to prototype complex architectures with minimal code changes.

Abstract

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.

Paper Structure

This paper contains 17 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: In contrast with deep learning, RL algorithms leverage parallelism at multiple levels and physical devices. Here, we show an RL algorithm composing derivative-free optimization, policy evaluation, gradient-based optimization, and model-based planning (Table \ref{['table:components']}).
  • Figure 2: Most RL algorithms today are written in a fully distributed style (a) where replicated processes independently compute and coordinate with each other according to their roles (if any). We propose a hierarchical control model (c), which extends (b) to support nesting in RL and hyperparameter tuning workloads, simplifying and unifying the programming models used for implementation.
  • Figure 3: Composing a distributed hyperparameter search with a function that also requires distributed computation involves complex nested parallel computation patterns. With MPI (a), a new program must be written from scratch that mixes elements of both. With hierarchical control (b), components can remain unchanged and simply be invoked as remote tasks.
  • Figure 4: Pseudocode for four RLlib policy optimizer step methods. Each step() operates over a local policy graph and array of remote evaluator replicas. Ray remote calls are highlighted in orange; other Ray primitives in blue (Section \ref{['sec:requirements']}). Apply is shorthand for updating weights. Minibatch code and helper functions omitted. The param server optimizer in RLlib also implements pipelining not shown here.
  • Figure 5: RLlib's centrally controlled policy optimizers match or exceed the performance of implementations in specialized systems. The RLlib parameter server optimizer using 8 internal shards is competitive with a Distributed TensorFlow implementation tested in similar conditions. RLlib's Ape-X policy optimizer scales to 160k frames per second with 256 workers at a frameskip of 4, more than matching a reference throughput of $\sim$45k fps at 256 workers, demonstrating that a single-threaded Python controller can efficiently scale to high throughputs.
  • ...and 3 more figures