RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica
TL;DR
This work tackles scaling and composition of reinforcement learning (RL) algorithms that exhibit irregular, nested parallelism by proposing a logically centralized, hierarchical control model and building RLlib on top of Ray to encapsulate parallelism inside short-running tasks. It introduces core abstractions—Policy Graph, Policy Evaluator, and Policy Optimizer—that unify distributed sampling, evaluation, and updates across diverse algorithms, including Ape-X, PPO, A3C, DQN, ES, and model-based/multi-agent variants. Empirically, RLlib achieves state-of-the-art throughput and scalability from single-node to large clusters, with results such as high-throughput Ape-X performance and large-scale ES up to thousands of cores, while remaining competitive with specialized systems. The approach enables rapid development and reuse of RL components, delivering scalable abstractions that empower researchers to prototype complex architectures with minimal code changes.
Abstract
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.
