Table of Contents
Fetching ...

Measuring Privacy Loss in Distributed Spatio-Temporal Data

Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri

TL;DR

This work proposes an alternative privacy loss against location reconstruction attacks by an informed adversary, and demonstrates that this privacy loss better reflects intuitions on individual privacy violation in the distributed spatio-temporal setting.

Abstract

Statistics about traffic flow and people's movement gathered from multiple geographical locations in a distributed manner are the driving force powering many applications, such as traffic prediction, demand prediction, and restaurant occupancy reports. However, these statistics are often based on sensitive location data of people, and hence privacy has to be preserved while releasing them. The standard way to do this is via differential privacy, which guarantees a form of rigorous, worst-case, person-level privacy. In this work, motivated by several counter-intuitive features of differential privacy in distributed location applications, we propose an alternative privacy loss against location reconstruction attacks by an informed adversary. Our experiments on real and synthetic data demonstrate that our privacy loss better reflects our intuitions on individual privacy violation in the distributed spatio-temporal setting.

Measuring Privacy Loss in Distributed Spatio-Temporal Data

TL;DR

This work proposes an alternative privacy loss against location reconstruction attacks by an informed adversary, and demonstrates that this privacy loss better reflects intuitions on individual privacy violation in the distributed spatio-temporal setting.

Abstract

Statistics about traffic flow and people's movement gathered from multiple geographical locations in a distributed manner are the driving force powering many applications, such as traffic prediction, demand prediction, and restaurant occupancy reports. However, these statistics are often based on sensitive location data of people, and hence privacy has to be preserved while releasing them. The standard way to do this is via differential privacy, which guarantees a form of rigorous, worst-case, person-level privacy. In this work, motivated by several counter-intuitive features of differential privacy in distributed location applications, we propose an alternative privacy loss against location reconstruction attacks by an informed adversary. Our experiments on real and synthetic data demonstrate that our privacy loss better reflects our intuitions on individual privacy violation in the distributed spatio-temporal setting.
Paper Structure (35 sections, 3 theorems, 14 equations, 8 figures)

This paper contains 35 sections, 3 theorems, 14 equations, 8 figures.

Key Result

Proposition 1

There exists an algorithm that computes $\hat{X}^\mathrm{MAP}$ in $\mathcal{O}(TM^2)$ time and $\mathcal{O}(TM)$ space.

Figures (8)

  • Figure 1: Our distributed spatial counting problem at a fixed time step. Each circle represents a location: a solid red circle indicates the count is published from the location, and dotted black circles indicate it is not. Blue triangles and yellow stars represent people. Since there are three triangles and one star within the red circle, the published count at this time is 4.
  • Figure 2: Our spatio-temporal counting problem at time $t$ and $t+1$. Each circle represents a location: a solid red circle indicates the count is published from the location, and dotted black circles indicate it is not. A blue triangle and yellow stars represent people. Since there are one triangle and two stars within the red circle at time $t$, the published count is 3. Person 1, who is labeled with "1", is likely to stay in the same location at the next time step. Person 2, who is labeled with "2", is likely to move to other locations at the next time step.
  • Figure 3: Relationship between the privacy loss upper bound in Section \ref{['sec:tighter-ub']} and the frequency of visits to fixed activated sensor location of each individual on Foursquare and Gowalla datasets. Note that for this plot, we choose and fix a single sensor to be activated. We observe those who have visited the fixed sensor location frequently have high privacy loss. The privacy loss for those who haven't visited the fixed sensor much is scattered between 0 and 1 since the privacy loss depends on other factors as well.
  • Figure 4: Relationship between the privacy loss upper bound in Section \ref{['sec:tighter-ub']} and the spectral gap of each individual's transition matrix on Foursquare and Gowalla datasets. Note that a higher spectral gap of transition matrix means that the Markov Chain induced by the person's trajectory mixes rapidly---namely, the person is fast-moving. We observe the negative correlation between the privacy loss and the spectral gap. Since the privacy loss depends on other factors as well, there are people whose transition matrices have small spectral gaps, while their privacy loss is small.
  • Figure 5: Privacy loss for the adversarial estimators and upper bounds on the simulated data. We sweep parameters: the number of locations $M$ (upper left), the number of time steps the adversary needs to guess correctly for success (upper right), the number of time steps $T$ with $s=T/2$ (lower left), and $T$ with $s=T-2$ (lower right). The privacy loss generally decreases as $M$ and the number of time steps for the adversary's success grow. As $T$ grows, it stays almost the constant when $s=T/2$ and increases when $s=T-2$.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1: $(\epsilon,\delta)$-Differential Privacy dwork_our_2006-1
  • Proposition 1
  • Proposition 2
  • Theorem 1