Federated Ensemble-Directed Offline Reinforcement Learning

Desik Rengarajan; Nitin Ragothaman; Dileep Kalathil; Srinivas Shakkottai

Federated Ensemble-Directed Offline Reinforcement Learning

Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, Srinivas Shakkottai

TL;DR

The FEDORA codebase is developed, which distills the collective wisdom of the clients using an ensemble learning approach, and it is shown that FEDORA significantly outperforms other approaches, including offline RL over the combined data pool, in various complex continuous control environments and real-world datasets.

Abstract

We consider the problem of federated offline reinforcement learning (RL), a scenario under which distributed learning agents must collaboratively learn a high-quality control policy only using small pre-collected datasets generated according to different unknown behavior policies. Naïvely combining a standard offline RL approach with a standard federated learning approach to solve this problem can lead to poorly performing policies. In response, we develop the Federated Ensemble-Directed Offline Reinforcement Learning Algorithm (FEDORA), which distills the collective wisdom of the clients using an ensemble learning approach. We develop the FEDORA codebase to utilize distributed compute resources on a federated learning platform. We show that FEDORA significantly outperforms other approaches, including offline RL over the combined data pool, in various complex continuous control environments and real-world datasets. Finally, we demonstrate the performance of FEDORA in the real-world on a mobile robot. We provide our code and a video of our experiments at \url{https://github.com/DesikRengarajan/FEDORA}.

Federated Ensemble-Directed Offline Reinforcement Learning

TL;DR

Abstract

Paper Structure (33 sections, 13 equations, 16 figures, 2 algorithms)

This paper contains 33 sections, 13 equations, 16 figures, 2 algorithms.

Introduction
Related Work
Preliminaries
Federated Offline Reinforcement Learning
Issues with Federated Offline RL
FEDORA Design Approach
Ensemble-Directed Learning over Client Policies
Federated Optimism for Critic Training
Proximal Policy Update for Heterogeneous Data
Decaying the Influence of Local Data
Experimental Evaluation
Experiments on Simulated Environments
Real-World Experiments on TurtleBot
Conclusion
Ethics Statement and Societal Impacts
...and 18 more sections

Figures (16)

Figure 1: Performance comparison of federated and centralized offline RL algorithms.
Figure 2: Evaluation of algorithms on different MuJoCo environments.
Figure 3: Comparison of FEDORA and centralized training with heterogeneous data.
Figure 4: Effect of varying the number of (a) local gradient steps, (b) participating clients in each round, and (c) expert clients in FEDORA.
Figure 5: Evaluation of FEDORA and other federated baselines for a mobile robot navigation task in the presence of an obstacle.
...and 11 more figures

Federated Ensemble-Directed Offline Reinforcement Learning

TL;DR

Abstract

Federated Ensemble-Directed Offline Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (16)