Differentially Private Data-Driven Markov Chain Modeling

Alexander Benvenuti; Brandon Fallin; Calvin Hawkins; Brendan Bialy; Miriam Dennis; Warren Dixon; Matthew Hale

Differentially Private Data-Driven Markov Chain Modeling

Alexander Benvenuti, Brandon Fallin, Calvin Hawkins, Brendan Bialy, Miriam Dennis, Warren Dixon, Matthew Hale

TL;DR

A method for protecting user data used to formulate a Markov chain model, and a method for privatizing database queries whose outputs are elements of the unit simplex is developed, it is proved that this method is differentially private.

Abstract

Markov chains model a wide range of user behaviors. However, generating accurate Markov chain models requires substantial user data, and sharing these models without privacy protections may reveal sensitive information about the underlying user data. We introduce a method for protecting user data used to formulate a Markov chain model. First, we develop a method for privatizing database queries whose outputs are elements of the unit simplex, and we prove that this method is differentially private. We quantify its accuracy by bounding the expected KL divergence between private and non-private queries. We extend this method to privatize stochastic matrices whose rows are each a simplex-valued query of a database, which includes data-driven Markov chain models. To assess their accuracy, we analytically bound the change in the stationary distribution and the change in the convergence rate between a non-private Markov chain model and its private form. Simulations show that under a typical privacy implementation, our method yields less than 2% error in the stationary distribution, indicating that our approach to private modeling faithfully captures the behavior of the systems we study.

Differentially Private Data-Driven Markov Chain Modeling

TL;DR

Abstract

Paper Structure (27 sections, 14 theorems, 89 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 27 sections, 14 theorems, 89 equations, 4 figures, 1 table, 2 algorithms.

INTRODUCTION
PRELIMINARIES AND PROBLEM FORMULATION
Unit Simplex and Stochastic Matrices
Markov Chains
Differential Privacy
Databases and Queries
Problem Statements
Dirichlet Mechanism for Differential Privacy of Simplex-Valued Queries
Computing epsilon
DIFFERENTIAL PRIVACY FOR MARKOV CHAIN MODELING
STATIONARY DISTRIBUTION AND CONVERGENCE RATE
NUMERICAL SIMULATIONS
Class Grade Distribution
New York City Taxis
Conclusion
...and 12 more sections

Key Result

Theorem 1

Fix a database $D$ with $N\in\mathbb{N}$ entries, a category set $\rho$ such that $|\rho| = n\in\mathbb{N}$, and $\eta>0$. Let $C(D, \rho)\in \Delta_{n}^{(\eta)}$ be the count vector as defined in eq:count. Let Assumptions ass:sat-ass:k hold. Then the Dirichlet mechanism with parameter $k>0$ and inp and

Figures (4)

Figure 1: KL divergence between a privatized grade distribution and its true grade distribution. The upper bound in Corollary \ref{['cor:kld']} becomes increasingly tight as privacy weakens, though it is close to the true value for all $\epsilon$. The distributions of $\tilde{C}$ and $C(D, \rho)$ remain similar even under strong privacy, highlighting that Algorithm \ref{['algo:private_count']} produces outputs with strong privacy that exhibit high accuracy.
Figure 2: Stationary distribution of the Markov chain formed by New York City taxi drop-offs and pickups. We see that, at the strongest privacy level $\epsilon = 3.73$, the stationary distribution of the privatized Markov chain model remains close to that of the non-private model.
Figure 3: Change in the stationary distribution with varying privacy strength. Even with $\epsilon = 3.73$, we find the average TV distance change in the stationary distribution is minimal.
Figure 4: Average relative error between $\delta$ and $\hat{\delta}$.

Theorems & Definitions (23)

Definition 1: Unit Simplex
Definition 2: Bordered Unit Simplex
Definition 3: Adjacency
Definition 4: Differential Privacy; dwork2014algorithmic
Example 1: City Traffic
Definition 5: Dirichlet Mechanism; gohari2021differential
Definition 6: Probabilistic Differential Privacy; machanavajjhala2008privacy, meiser2018approximate
Theorem 1: Solution to Problem \ref{['prob:count']}
Theorem 2: Solution to Problem \ref{['prob:sim_vec']}
Corollary 1: Alternative Solution to Problem \ref{['prob:sim_vec']}
...and 13 more

Differentially Private Data-Driven Markov Chain Modeling

TL;DR

Abstract

Differentially Private Data-Driven Markov Chain Modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (23)