Table of Contents
Fetching ...

Differential Privacy for Symbolic Trajectories via the Permute-and-Flip Mechanism

Alexander Benvenuti, Huaiyuan Rao, Matthew Hale

Abstract

Privacy techniques have been developed for data-driven systems, but systems with non-numeric data cannot use typical noise-adding techniques. Therefore, we develop a new mechanism for privatizing state trajectories of symbolic systems that may be represented as words over a finite alphabet. Such systems include Markov chains, Markov decision processes, and finite-state automata, and we protect their symbolic trajectories with differential privacy. The mechanism we develop randomly selects a private approximation to be released in place of the original sensitive word, with a bias towards low-error private words. This work is based on the permute-and-flip mechanism for differential privacy, which can be applied to non-numeric data. However, a naïve implementation would have to enumerate an exponentially large list of words to generate a private word. As a result, we develop a new mechanism that generates private words without ever needing to enumerate such a list. We prove that the accuracy of our mechanism is never worse than the prior state of the art, and we empirically show on a real traffic dataset that it introduces up to $55\%$ less error than the prior state of the art under a conventional privacy implementation.

Differential Privacy for Symbolic Trajectories via the Permute-and-Flip Mechanism

Abstract

Privacy techniques have been developed for data-driven systems, but systems with non-numeric data cannot use typical noise-adding techniques. Therefore, we develop a new mechanism for privatizing state trajectories of symbolic systems that may be represented as words over a finite alphabet. Such systems include Markov chains, Markov decision processes, and finite-state automata, and we protect their symbolic trajectories with differential privacy. The mechanism we develop randomly selects a private approximation to be released in place of the original sensitive word, with a bias towards low-error private words. This work is based on the permute-and-flip mechanism for differential privacy, which can be applied to non-numeric data. However, a naïve implementation would have to enumerate an exponentially large list of words to generate a private word. As a result, we develop a new mechanism that generates private words without ever needing to enumerate such a list. We prove that the accuracy of our mechanism is never worse than the prior state of the art, and we empirically show on a real traffic dataset that it introduces up to less error than the prior state of the art under a conventional privacy implementation.

Paper Structure

This paper contains 17 sections, 4 theorems, 21 equations, 5 figures, 2 algorithms.

Key Result

Lemma 1

Fix an alphabet $\Sigma$, a word length $n\in\mathbb{N}^+$, a set $L\subseteq \Sigma^n$, and an adjacency parameter $b\in\mathbb{N}^+$. Then the sensitivity of the utility function $u$ from eq:udef is

Figures (5)

  • Figure 3: Modified Hamming distance automaton for all private output words of length $3$ and Hamming distance $2$ from the sensitive input word $abc$ over the alphabet $\{a, b, c\}$. Each circle represents a state where the base of the number in the circle is the current length of the private output word and the superscript is the number of errors made on the way to reaching that state. Each arrow represents a feasible transition, and the state with the double circle represents the accepting state. Probabilities from the policy $\mu$ are in red.
  • Figure 4: The upper and lower bounds for the expected error $\mathbb{E}\left[\ell\right]$ from Theorem \ref{['thm:accuracy_bounds']} with (a) varying $\epsilon\in[0.1, 10]$ and fixed $n = 5$ and $m = 2$, (b) varying $n\in\{2, 3,\ldots, 20\}$ and fixed $\epsilon = 5$ and $m = 2$, and (c) varying $m\in\{2, 3,\ldots, 10\}$ and fixed $\epsilon = 5$ and $n = 5$. The bounds are tightest with large $m$ and $n$, and small $\epsilon$.
  • Figure 5: A P-MNFA (b) for a four-state Markov chain (a).
  • Figure 6: Sample private output words with $n = 14$ through Gainesville, Florida with SW 34th St as the initial state (green point). At $\epsilon = 5$, the average error is less than $1$, and private output words are often close to the sensitive input word. At stronger privacy, i.e., $\epsilon =0.5$, the average error approaches $13$, and private output words often differ in every state from the input word, which is the case for the sampled private output word with $\epsilon = 0.5$ shown here.
  • Figure 7: Error comparison between Mechanism \ref{['mech:prob3']} in the current paper and chen2023differentialsymbolic with the initial state in Figure \ref{['fig:trajectories']} for privacy parameters in the range $\epsilon \in [0.1, 10]$ (top) and $\epsilon \in [3, 6]$ (bottom). Under strong privacy, e.g., $\epsilon = 0.1$, both mechanisms yield nearly identical average errors, but at more common privacy levels, e.g., $\epsilon = 3$, Mechanism \ref{['mech:prob3']} shows a $25\%$ reduction in error relative to chen2023differentialsymbolic.

Theorems & Definitions (10)

  • Definition 1: Finite State Automaton
  • Definition 2: Word Adjacency; jones2019towards
  • Definition 3: Word Differential Privacy; jones2019towards
  • Lemma 1: Sensitivity; jones2019towards
  • Definition 4: Permute-and-Flip; mckenna2020permute
  • Definition 5: Modified Hamming distance automaton; chen2023differentialsymbolic
  • Theorem 1
  • Theorem 2: Solution to Problem \ref{['prob:accuracy']}
  • Definition 6: Product Modified Hamming Distance NFA; chen2023differentialsymbolic
  • Theorem 3