Differentially Private Deep Model-Based Reinforcement Learning

Alexandre Rio; Merwan Barlier; Igor Colin; Albert Thomas

Differentially Private Deep Model-Based Reinforcement Learning

Alexandre Rio, Merwan Barlier, Igor Colin, Albert Thomas

TL;DR

This work introduces PriMORL, a model-based RL algorithm with formal differential privacy guarantees that enables the training of private RL agents on offline continuous control tasks with deep function approximations, whereas current methods are limited to simpler tabular and linear Markov Decision Processes (MDPs).

Abstract

We address private deep offline reinforcement learning (RL), where the goal is to train a policy on standard control tasks that is differentially private (DP) with respect to individual trajectories in the dataset. To achieve this, we introduce PriMORL, a model-based RL algorithm with formal differential privacy guarantees. PriMORL first learns an ensemble of trajectory-level DP models of the environment from offline data. It then optimizes a policy on the penalized private model, without any further interaction with the system or access to the dataset. In addition to offering strong theoretical foundations, we demonstrate empirically that PriMORL enables the training of private RL agents on offline continuous control tasks with deep function approximations, whereas current methods are limited to simpler tabular and linear Markov Decision Processes (MDPs). We furthermore outline the trade-offs involved in achieving privacy in this setting.

Differentially Private Deep Model-Based Reinforcement Learning

TL;DR

Abstract

Paper Structure (38 sections, 8 theorems, 21 equations, 7 figures, 6 tables, 4 algorithms)

This paper contains 38 sections, 8 theorems, 21 equations, 7 figures, 6 tables, 4 algorithms.

Introduction
Contributions.
Related Work
Preliminaries
Offline Model-Based Reinforcement Learning
Differential Privacy
Differentially Private Model-Based Offline Reinforcement Learning
Trajectory-level Privacy in Offline Reinforcement Learning
Model Learning with Differential Privacy
Trajectory-level DP Training for Model Ensembles
Privacy Guarantees for the Model
Policy Optimization under a Private Model
Impact of Privacy on Policy Optimization
Mitigating Private Model Uncertainty
Private Policy Optimization
...and 23 more sections

Key Result

Theorem 4.2

$(\epsilon, \delta)$-TDP guarantees for dynamics model. Given $\delta \in (0,1)$, noise multiplier $z$, sampling ratio $q$ and number of training iterations $T$, let $\epsilon := \epsilon^{\text{MA}}\left(z, q, T, \delta\right)$ be the privacy budget computed by the moments accounting method from Ab

Figures (7)

Figure 1: PriMORL with its two main components: 1) private model training and 2) MBPO.
Figure 2: Learning curves on Pendulum (left), Balance (middle) and Swingup (right).
Figure 3: Comparison of policy performance with $u_\text{MA}$ and $u_\text{MPD}$ for a fixed model. We measure the average performance of the policy over the last 10 epochs of training. Average and confidence intervals are computed over 5 random seeds.
Figure 4: Policy performance on Pendulum as a function of the privacy budget $\epsilon$. We measure the average performance of the policy over the last 5 epochs of training. Average and confidence intervals are computed over 5 random seeds.
Figure 5: Learning curves for the SAC policy on HalfCheetah (right). Policy performance (episodic return) is evaluated in the true MDP at the end of each training epoch, over 10 evaluation episodes with different random seeds.
...and 2 more figures

Theorems & Definitions (13)

Definition 3.1
Definition 4.1
Theorem 4.2
Proposition 4.2
Proposition 4.2
Theorem 4.3
Theorem A.1
proof
Theorem A.1
proof
...and 3 more

Differentially Private Deep Model-Based Reinforcement Learning

TL;DR

Abstract

Differentially Private Deep Model-Based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (13)