Pearl: A Production-ready Reinforcement Learning Agent

Zheqing Zhu; Rodrigo de Salvo Braz; Jalaj Bhandari; Daniel Jiang; Yi Wan; Yonathan Efroni; Liyuan Wang; Ruiyang Xu; Hongbo Guo; Alex Nikulkov; Dmytro Korenkevych; Urun Dogan; Frank Cheng; Zheng Wu; Wanqiao Xu

Pearl: A Production-ready Reinforcement Learning Agent

Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

TL;DR

This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way and highlights examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases.

Abstract

Reinforcement learning (RL) is a versatile framework for optimizing long-term goals. Although many real-world problems can be formalized with RL, learning and deploying a performant RL policy requires a system designed to address several important challenges, including the exploration-exploitation dilemma, partial observability, dynamic action spaces, and safety concerns. While the importance of these challenges has been well recognized, existing open-source RL libraries do not explicitly address them. This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way. In addition to presenting benchmarking results, we also highlight examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on GitHub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.

Pearl: A Production-ready Reinforcement Learning Agent

TL;DR

Abstract

Paper Structure (47 sections, 5 equations, 10 figures, 3 tables)

This paper contains 47 sections, 5 equations, 10 figures, 3 tables.

Introduction
Pearl Agent Design
Policy learner module:
Exploration module:
Safety module:
History summarization module:
Comparison to Existing Libraries
Adoption of Pearl in Industry Applications
Auction-based recommender systems:
Ads auction bidding:
Creative selection:
Using Pearl
Architecture, Modularity and API design
Handling Errors and Edge Cases
Benchmark Results
...and 32 more sections

Figures (10)

Figure 1: Interface of Pearl Agent
Figure 2: Training returns for discrete control methods on the CartPole task. The left and right panels show returns for value- and policy-based methods, respectively.
Figure 3: Training returns for Continuous SAC, DDPG and TD3 on four continuous control tasks in MuJoCo.
Figure 4: Performance of neural contextual bandit policy learner with Linear UCB, Thompson Sampling and SquareCB based exploration modules in Pearl on four UCI datasets. We also add an offline baseline that is considered near optimal.
Figure 5: Agent Versatility Benchmark Results: learning curves for (a) DQN, with and without LSTM, in the partially observable CartPole environment, (b) DQN and Bootstrapped DQN in a $10\times10$ Deep Sea environment, and (c) QR-DQN with mean variance risk sensitive safety module. Under a high degree of risk aversion, the learned policy has a smaller expected return but a smaller variance of the return distribution as well.
...and 5 more figures

Pearl: A Production-ready Reinforcement Learning Agent

TL;DR

Abstract

Pearl: A Production-ready Reinforcement Learning Agent

Authors

TL;DR

Abstract

Table of Contents

Figures (10)