Privacy Preserving Reinforcement Learning for Population Processes

Samuel Yang-Zhao; Kee Siong Ng

Privacy Preserving Reinforcement Learning for Population Processes

Samuel Yang-Zhao, Kee Siong Ng

TL;DR

This work addresses privacy in reinforcement learning for population processes where an agent learns from population-level statistics while individuals’ data may be correlated across time. It introduces a DP-RL meta algorithm that privatizes per-step state and reward signals using a projected Laplace mechanism, and leverages Pufferfish privacy to handle correlation, establishing an equivalence to DP under $T$-fold adaptive composition. The authors prove a finite-sample-like bound on the value-function approximation error, showing it decays as the population size $N$ and the privacy budget $\epsilon$ grow, and validate the approach on simulated epidemic-control tasks with large graphs. The results indicate that reasonable privacy-utility trade-offs are achievable in population-based RL and provide a principled framework for privacy-aware RL in correlated, population-scale settings.

Abstract

We consider the problem of privacy protection in Reinforcement Learning (RL) algorithms that operate over population processes, a practical but understudied setting that includes, for example, the control of epidemics in large populations of dynamically interacting individuals. In this setting, the RL algorithm interacts with the population over $T$ time steps by receiving population-level statistics as state and performing actions which can affect the entire population at each time step. An individual's data can be collected across multiple interactions and their privacy must be protected at all times. We clarify the Bayesian semantics of Differential Privacy (DP) in the presence of correlated data in population processes through a Pufferfish Privacy analysis. We then give a meta algorithm that can take any RL algorithm as input and make it differentially private. This is achieved by taking an approach that uses DP mechanisms to privatize the state and reward signal at each time step before the RL algorithm receives them as input. Our main theoretical result shows that the value-function approximation error when applying standard RL algorithms directly to the privatized states shrinks quickly as the population size and privacy budget increase. This highlights that reasonable privacy-utility trade-offs are possible for differentially private RL algorithms in population processes. Our theoretical findings are validated by experiments performed on a simulated epidemic control problem over large population sizes.

Privacy Preserving Reinforcement Learning for Population Processes

TL;DR

-fold adaptive composition. The authors prove a finite-sample-like bound on the value-function approximation error, showing it decays as the population size

and the privacy budget

grow, and validate the approach on simulated epidemic-control tasks with large graphs. The results indicate that reasonable privacy-utility trade-offs are achievable in population-based RL and provide a principled framework for privacy-aware RL in correlated, population-scale settings.

Abstract

time steps by receiving population-level statistics as state and performing actions which can affect the entire population at each time step. An individual's data can be collected across multiple interactions and their privacy must be protected at all times. We clarify the Bayesian semantics of Differential Privacy (DP) in the presence of correlated data in population processes through a Pufferfish Privacy analysis. We then give a meta algorithm that can take any RL algorithm as input and make it differentially private. This is achieved by taking an approach that uses DP mechanisms to privatize the state and reward signal at each time step before the RL algorithm receives them as input. Our main theoretical result shows that the value-function approximation error when applying standard RL algorithms directly to the privatized states shrinks quickly as the population size and privacy budget increase. This highlights that reasonable privacy-utility trade-offs are possible for differentially private RL algorithms in population processes. Our theoretical findings are validated by experiments performed on a simulated epidemic control problem over large population sizes.

Paper Structure (24 sections, 10 theorems, 59 equations, 4 figures, 3 tables, 4 algorithms)

This paper contains 24 sections, 10 theorems, 59 equations, 4 figures, 3 tables, 4 algorithms.

Introduction
Related Work
Contributions
Preliminaries
Reinforcement Learning and Markov Decision Processes
Stochastic Population Processes
Differential Privacy
Pufferfish Privacy.
Problem Setting
Model
Formalizing Privacy in the Presence of Correlated Data
Differentially Private Reinforcement Learning
Privacy Analysis
Truthfulness in Data Collection
Utility Analysis
...and 9 more sections

Key Result

Lemma 1

A family of mechanisms $\mathcal{F}$ satisfies $(\epsilon, \delta)$-differential privacy under $T$-fold adaptive composition iff every sequence of mechanisms $\mathcal{M} = (\mathcal{M}_1, \ldots, \mathcal{M}_T)$, with $\mathcal{M}_i \in \mathcal{F}$, satisfies $(\epsilon, \delta)$-Pufferfish privac

Figures (4)

Figure 1: Visualisation of the parameters that govern the transitions between states for individuals in the SEIRS process over contact networks.
Figure 2: A graphical model of the underlying state and action sequence under our differentially private reinforcement learning approach. The true states are unobservable.
Figure 3: Left: Target privacy vs privacy achieved under equations (\ref{['eqn:adaptive_comp_formula']}) and (\ref{['eqn:per_step_eps_formula']}) as $\delta$ is varied. $T = 5e5$. Right: DP-DQN performance as $\epsilon$ is varied on graphs with 82K and 196K nodes.
Figure 4: DP-DQN performance as $\epsilon$ is varied on graphs with 82K and 196K nodes.

Theorems & Definitions (23)

Definition 1: Differential Privacy
Definition 2: $\ell_1$ sensitivity
Definition 3: Pufferfish Privacy
Example 1: Epidemic Control
Example 2: Countering Misinformation
Example 3: Malware Detection and Control
Example 4
Lemma 1
Lemma 2: Post-processing dwork2014algorithmic
Lemma 3: $T$-fold adaptive composition dwork2010boostingdwork2014algorithmic
...and 13 more

Privacy Preserving Reinforcement Learning for Population Processes

TL;DR

Abstract

Privacy Preserving Reinforcement Learning for Population Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (23)