Table of Contents
Fetching ...

Differentially Private Data-Driven Markov Chain Modeling

Alexander Benvenuti, Brandon Fallin, Calvin Hawkins, Brendan Bialy, Miriam Dennis, Warren Dixon, Matthew Hale

TL;DR

A method for protecting user data used to formulate a Markov chain model, and a method for privatizing database queries whose outputs are elements of the unit simplex is developed, it is proved that this method is differentially private.

Abstract

Markov chains model a wide range of user behaviors. However, generating accurate Markov chain models requires substantial user data, and sharing these models without privacy protections may reveal sensitive information about the underlying user data. We introduce a method for protecting user data used to formulate a Markov chain model. First, we develop a method for privatizing database queries whose outputs are elements of the unit simplex, and we prove that this method is differentially private. We quantify its accuracy by bounding the expected KL divergence between private and non-private queries. We extend this method to privatize stochastic matrices whose rows are each a simplex-valued query of a database, which includes data-driven Markov chain models. To assess their accuracy, we analytically bound the change in the stationary distribution and the change in the convergence rate between a non-private Markov chain model and its private form. Simulations show that under a typical privacy implementation, our method yields less than 2% error in the stationary distribution, indicating that our approach to private modeling faithfully captures the behavior of the systems we study.

Differentially Private Data-Driven Markov Chain Modeling

TL;DR

A method for protecting user data used to formulate a Markov chain model, and a method for privatizing database queries whose outputs are elements of the unit simplex is developed, it is proved that this method is differentially private.

Abstract

Markov chains model a wide range of user behaviors. However, generating accurate Markov chain models requires substantial user data, and sharing these models without privacy protections may reveal sensitive information about the underlying user data. We introduce a method for protecting user data used to formulate a Markov chain model. First, we develop a method for privatizing database queries whose outputs are elements of the unit simplex, and we prove that this method is differentially private. We quantify its accuracy by bounding the expected KL divergence between private and non-private queries. We extend this method to privatize stochastic matrices whose rows are each a simplex-valued query of a database, which includes data-driven Markov chain models. To assess their accuracy, we analytically bound the change in the stationary distribution and the change in the convergence rate between a non-private Markov chain model and its private form. Simulations show that under a typical privacy implementation, our method yields less than 2% error in the stationary distribution, indicating that our approach to private modeling faithfully captures the behavior of the systems we study.
Paper Structure (27 sections, 14 theorems, 89 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 27 sections, 14 theorems, 89 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Fix a database $D$ with $N\in\mathbb{N}$ entries, a category set $\rho$ such that $|\rho| = n\in\mathbb{N}$, and $\eta>0$. Let $C(D, \rho)\in \Delta_{n}^{(\eta)}$ be the count vector as defined in eq:count. Let Assumptions ass:sat-ass:k hold. Then the Dirichlet mechanism with parameter $k>0$ and inp and

Figures (4)

  • Figure 1: KL divergence between a privatized grade distribution and its true grade distribution. The upper bound in Corollary \ref{['cor:kld']} becomes increasingly tight as privacy weakens, though it is close to the true value for all $\epsilon$. The distributions of $\tilde{C}$ and $C(D, \rho)$ remain similar even under strong privacy, highlighting that Algorithm \ref{['algo:private_count']} produces outputs with strong privacy that exhibit high accuracy.
  • Figure 2: Stationary distribution of the Markov chain formed by New York City taxi drop-offs and pickups. We see that, at the strongest privacy level $\epsilon = 3.73$, the stationary distribution of the privatized Markov chain model remains close to that of the non-private model.
  • Figure 3: Change in the stationary distribution with varying privacy strength. Even with $\epsilon = 3.73$, we find the average TV distance change in the stationary distribution is minimal.
  • Figure 4: Average relative error between $\delta$ and $\hat{\delta}$.

Theorems & Definitions (23)

  • Definition 1: Unit Simplex
  • Definition 2: Bordered Unit Simplex
  • Definition 3: Adjacency
  • Definition 4: Differential Privacy; dwork2014algorithmic
  • Example 1: City Traffic
  • Definition 5: Dirichlet Mechanism; gohari2021differential
  • Definition 6: Probabilistic Differential Privacy; machanavajjhala2008privacy, meiser2018approximate
  • Theorem 1: Solution to Problem \ref{['prob:count']}
  • Theorem 2: Solution to Problem \ref{['prob:sim_vec']}
  • Corollary 1: Alternative Solution to Problem \ref{['prob:sim_vec']}
  • ...and 13 more