Federated reinforcement learning for robot motion planning with zero-shot generalization

Zhenyuan Yuan; Siyuan Xu; Minghui Zhu

Federated reinforcement learning for robot motion planning with zero-shot generalization

Zhenyuan Yuan, Siyuan Xu, Minghui Zhu

TL;DR

A federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server without sharing their raw data and leverages on the derived zero-shot generalization guarantees on arrival time and safety.

Abstract

This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.

Federated reinforcement learning for robot motion planning with zero-shot generalization

TL;DR

Abstract

Paper Structure (25 sections, 11 theorems, 66 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 11 theorems, 66 equations, 13 figures, 3 tables, 1 algorithm.

Introduction
Problem Formulation
Environment-specific motion planning
Robot motion planning with zero-shot generalization
Federated reinforcement learning
Algorithm Statement
The FedGen algorithm
Learner-based update
Cloud update
Learner-based fusion
Performance guarantees
Discussion
Proofs
Proof of Theorem \ref{['thm: monotonic optimality']}
Proof of Theorem \ref{['thm: pareto improvement']}
...and 10 more sections

Key Result

Theorem 3.1

Suppose Assumptions assmp: stochastic environment and assmp: stochastic initialization hold. The following properties are true for all $i\in\mathcal{V}$:

Figures (13)

Figure 1: Implementation FedGen for learner $i$ in iteration $k$
Figure 2: Parameter update logic at each iteration
Figure 3: A sample environment in PyBullet
Figure 4: Generalized performances to unseen environments
Figure 5: Comparison between initial policy, locally converged policy and globally converged policy
...and 8 more figures

Theorems & Definitions (11)

Theorem 3.1
Lemma 3.5
Theorem 3.6
Theorem 3.7
Theorem 4.1
Lemma 4.2
Lemma 4.3
Lemma 4.4
Lemma 4.5
Lemma 4.6
...and 1 more

Federated reinforcement learning for robot motion planning with zero-shot generalization

TL;DR

Abstract

Federated reinforcement learning for robot motion planning with zero-shot generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (11)