Table of Contents
Fetching ...

Pearl: A Production-ready Reinforcement Learning Agent

Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

TL;DR

This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way and highlights examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases.

Abstract

Reinforcement learning (RL) is a versatile framework for optimizing long-term goals. Although many real-world problems can be formalized with RL, learning and deploying a performant RL policy requires a system designed to address several important challenges, including the exploration-exploitation dilemma, partial observability, dynamic action spaces, and safety concerns. While the importance of these challenges has been well recognized, existing open-source RL libraries do not explicitly address them. This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way. In addition to presenting benchmarking results, we also highlight examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on GitHub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.

Pearl: A Production-ready Reinforcement Learning Agent

TL;DR

This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way and highlights examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases.

Abstract

Reinforcement learning (RL) is a versatile framework for optimizing long-term goals. Although many real-world problems can be formalized with RL, learning and deploying a performant RL policy requires a system designed to address several important challenges, including the exploration-exploitation dilemma, partial observability, dynamic action spaces, and safety concerns. While the importance of these challenges has been well recognized, existing open-source RL libraries do not explicitly address them. This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way. In addition to presenting benchmarking results, we also highlight examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on GitHub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.
Paper Structure (47 sections, 5 equations, 10 figures, 3 tables)

This paper contains 47 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Interface of Pearl Agent
  • Figure 2: Training returns for discrete control methods on the CartPole task. The left and right panels show returns for value- and policy-based methods, respectively.
  • Figure 3: Training returns for Continuous SAC, DDPG and TD3 on four continuous control tasks in MuJoCo.
  • Figure 4: Performance of neural contextual bandit policy learner with Linear UCB, Thompson Sampling and SquareCB based exploration modules in Pearl on four UCI datasets. We also add an offline baseline that is considered near optimal.
  • Figure 5: Agent Versatility Benchmark Results: learning curves for (a) DQN, with and without LSTM, in the partially observable CartPole environment, (b) DQN and Bootstrapped DQN in a $10\times10$ Deep Sea environment, and (c) QR-DQN with mean variance risk sensitive safety module. Under a high degree of risk aversion, the learned policy has a smaller expected return but a smaller variance of the return distribution as well.
  • ...and 5 more figures